Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Similar documents
Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

A Technique for Offline Handwritten Character Recognition

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

Segmentation of Characters of Devanagari Script Documents

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

PCA-based Offline Handwritten Character Recognition System

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

Problems in Extraction of Date Field from Gurmukhi Documents

Segmentation of Isolated and Touching characters in Handwritten Gurumukhi Word using Clustering approach

OCR For Handwritten Marathi Script

A Technique for Classification of Printed & Handwritten text

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

Date Field Extraction from Gurmukhi Handwritten Documents

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

A Review on Handwritten Character Recognition

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

LECTURE 6 TEXT PROCESSING

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Character Recognition Using Matlab s Neural Network Toolbox

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

A Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Neural Network Classifier for Isolated Character Recognition

Neural network based Numerical digits Recognization using NNT in Matlab

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis

Paper ID: NITETE&TC05 THE HANDWRITTEN DEVNAGARI NUMERALS RECOGNITION USING SUPPORT VECTOR MACHINE

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

Text Recognition in Videos using a Recurrent Connectionist Approach

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

RECOGNITION OF HANDWRITTEN DEVANAGARI WORDS USING NEURAL NETWORK

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

A two-stage approach for segmentation of handwritten Bangla word images

HCR Using K-Means Clustering Algorithm

Unsupervised Feature Learning for Optical Character Recognition

Deep Neural Networks Applications in Handwriting Recognition

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Offline Handwritten Gurmukhi Word Recognition Using Deep Neural Networks

Multi-Layer Perceptron Network For Handwritting English Character Recoginition

Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks. Pilar Gómez-Gil, PhD ISCI 2012

Deep Neural Networks Applications in Handwriting Recognition

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Preprocessing of Gurmukhi Strokes in Online Handwriting Recognition

Handwritten Hindi Character Recognition System Using Edge detection & Neural Network

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

Opportunities and Challenges of Handwritten Sanskrit Character Recognition System

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Chapter Review of HCR

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

SEGMENTATION OF BROKEN CHARACTERS OF HANDWRITTEN GURMUKHI SCRIPT

Machine Learning: Handwritten Character Recognition

Handwritten Devanagari Character Recognition Model Using Neural Network

Offline Handwritten Gurmukhi Character Recognition: A Review

Handwritten Character Recognition with Feedback Neural Network

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text

Image Normalization and Preprocessing for Gujarati Character Recognition

II. WORKING OF PROJECT

Handwritten Character Recognition A Review

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features

Offline Handwritten Word Recognition using Multiple Features with SVM Classifier for Holistic Approach

Handwritten Text Recognition

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Scanning Neural Network for Text Line Recognition

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network

Devanagari Isolated Character Recognition by using Statistical features

Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition

Handwritten Numeral Recognition of Kannada Script

Keywords Handwritten alphabet recognition, local binary pattern (LBP), feature Descriptor, nearest neighbor classifier.

FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS

Handwritten Text Recognition

FEATURE EXTRACTION TECHNIQUE FOR HANDWRITTEN INDIAN NUMBERS CLASSIFICATION

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Mobile Application with Optical Character Recognition Using Neural Network

International Journal of Scientific & Engineering Research, Volume 8, Issue 3, March ISSN

ISSN: [Mukund* et al., 6(4): April, 2017] Impact Factor: 4.116

A New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Handwritten English Character Segmentation by Baseline Pixel Burst Method (BPBM)

A Document Image Analysis System on Parallel Processors

International Journal of Advance Engineering and Research Development OPTICAL CHARACTER RECOGNITION USING ARTIFICIAL NEURAL NETWORK

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition

Human Face Detection and Tracking using Skin Color Combined with Optical Flow Based Segmentation

A semi-incremental recognition method for on-line handwritten Japanese text

A Novel Handwritten Gurmukhi Character Recognition System Based On Deep Neural Networks

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Handwritten English Alphabet Recognition Using Bigram Cost Chengshu (Eric) Li Fall 2015, CS229, Stanford University

Recognition of Unconstrained Malayalam Handwritten Numeral

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network

Keywords Connected Components, Text-Line Extraction, Trained Dataset.

Transcription:

139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College of Engineering, Talwandi Sabo, Bathinda, Punjab, India 2 Associate Professor, Department of Computer Science & Engineering, Yadavindra College of Engineering, Talwandi Sabo, Bathinda, Punjab, India 1 cse2010.harmeet@gmail.com, 2 simpel_jindal@rediffmail.com Abstract In this paper, Handwritten Gurumukhi Characters have been recognized by using the Recurrent Neural Network (RNN). Handwritten character recognition is more complicated than the printed character recognition because of the different writing styles of the person to person. In this work, three types of feature extraction namely zoning features, intersection and open end point feature and horizontal peak extent features have been extracted. For classification purpose RNN classifier has been used. In this proposed work, 2450 samples of handwritten Gurumukhi characters are used. The proposed system achieves 87.34% accuracy for recognition. Keywords: Handwritten Character Recognition, OCR, Feature extraction, RNN. 1. Introduction Handwritten character recognition (HCR) is a method of automatic computer recognition of characters in optically scanned and digitized pages of text. The main goal of handwritten Gurumukhi character recognition (HCGR) is to recognize Gurumukhi characters that are in the shape of digital images, without the human intervention. It is a simplest and straight forward method for instant retrieval. Offline Handwritten Character Recognition is a challenging task because of different writing styles of the individuals. For recognition of characters various methods are proposed by researchers for character recognition in various scripts. But work for handwritten Gurumukhi character recognition is not so extensive and research on handwritten Gurumukhi characters is still in continuation. Feature extraction is most essential in character recognition because the recognition accuracy completely depends on the features that we have extracted. Various feature extraction techniques are available for extracting the features. Graves et al. [1] have used multidimensional LSTM with connectionist temporal classification and a

140 hierarchical layer structure to create a powerful offline handwriting recognizer for Arabic script and achieved accuracy of 91.4%. Kumar et al. [2] have described a grading system for writers primarily based on off-line handwritten Gurumukhi characters recognition using zoning, directional, diagonal, intersection and open end points and Zernike moments features. k-nn, HMM and Bayesian classifiers for classification have been used by them and comparison the handwriting of one writer with other writers by attaching a score with handwriting of a writer. Sharma et al. [3] have proposed zone based feature extraction method for the recognition of handwritten isolated numerals. Nearest neighbor classifier is used to perform classification and recognition. The recognition rate of 99.89% is achieved for handwritten isolated numerals. Shahiduzzaman [4] has worked on Bangla handwritten character recognition by using the classifier Hidden Markov Model (HMM) by the side of Artificial Neural Network (ANN). He has experimented most effective only on individual characters. No compound characters have been considered for this. From the test result 80% accuracy on an average is achieved. Messina and Dour [5] have worked upon segmentation free handwritten Chinese character recognition. To implement this multi dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Network is used. In this work, the raw inputs of the network allow to gain a character error rate of 16.5% and the use of language model at character level convey overall performance to 10.6%. In the present paper, we have proposed a system for handwritten Gurumukhi character recognition. In this work, three types of feature namely as zoning features, intersection and open end point features and horizontal peak extent features are extracted. For character recognition, we have used Recurrent Neural Network (RNN) classifier. 2. Proposed work We have proposed a system for recognition of handwritten Gurumukhi characters. The methodology used for this work is shown in Figure 1. 2.1 Digitization Digitization is the first step of the character recognition. In this phase, handwritten character image is scanned and converted into electronic form of bitmap image. Therefore, digital image is

141 produced in digitization. This digital image is used in further next steps. After digitization preprocessing is performed. Handwritten Gurumukhi character Digitization Pre-processing Feature extraction (Zoning, Horizontal peak extent and Intersection & Open end Point) Classification by using RNN Recognized Character Figure 1: Flow chart for handwritten Gurmukhi character recognition. 2.2 Pre-processing The pre-processing is a chain of operation accomplished on scanned enter image. The image should to have unique format along with jpeg, bmp, etc. This photograph is acquired through a scanner, virtual digital cam or some other appropriate virtual input gadgets. Normally, noise filtering, smoothing and normalization need to be finished on this step. The pre-processing additionally defines a compact illustration of the pattern. 2.3 Feature Extraction The accuracy of recognition system is particularly depends on the features, those ones are extracted in Feature extraction phase. We have used feature extraction techniques, namely,

142 zoning features, horizontal peak extent features and intersection & open end points features for completion of our experiment. 2.3.1 Zoning based feature extraction Zoning based feature is the best method for feature extraction. In this method, digitized image is divided into n number of zones having each of an equal size. After that, we have calculated the pixel density in each zone by means of thinking about the number of foreground pixels in the corresponding zone. We have taken an image with 100 100 pixels. Here, we have divided the digitized image into n=100 equal zones (as shown in Figure 2) and using this approach we have extracted 100 features for each image as shown below. Z 1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10 Z11 Z12 Z13 Z14 Z15 Z16 Z17 Z18 Z19 Z20 Z21 Z22 Z23 Z24 Z25 Z26 Z27 Z28 Z29 Z30 Z31 Z32 Z33 Z34 Z35 Z36 Z37 Z38 Z39 Z40 Z41 Z42 Z43 Z44 Z45 Z46 Z47 Z48 Z49 Z50 Z51 Z52 Z53 Z54 Z55 Z56 Z57 Z58 Z59 Z60 Z61 Z62 Z63 Z64 Z65 Z66 Z67 Z68 Z69 Z70 Z71 Z72 Z73 Z74 Z75 Z76 Z77 Z78 Z79 Z80 Z81 Z82 Z83 Z84 Z85 Z86 Z87 Z88 Z89 Z90 Z91 Z92 Z93 Z94 Z95 Z96 Z97 Z98 Z99 Z100 Figure 2: Zoning based features. 2.3.2 Intersection and open end point features Other feature extraction approach, used by us in our research work is intersection and open end point features. Intersection point is a point that composed more than one pixel in its neighborhood and an open end point has most effective only one point in its neighbor-hood. Steps to implement the intersection and open end point feature are as follow: Step 1: Divide the image into n no. of an equal zones. We have extracted n=100 features for every image, one feature for each and every zone. Step 2: Calculate the no. of intersection and open end point for each zone of an individual character image.

143 2.3.3 Horizontal peak extent based feature In this approach, sum of successive foreground pixel extents in horizontal direction in each row of zone are taken into consideration. Peak values in each row of zone are summed up as much as to calculate the corresponding feature value for a zone of an image of character as shown in Figure 3. SUM = 38 (a) (b) Figure 3: Peak Extent based features (a) bitmap image (b) horizontal peak extent based feature Following steps have been used for extract the horizontal peak extent primarily based features: Step 1: Divide the enter image into n no. of equal size zones, each containing m m pixels. Step 2: Find the sum of successive foreground pixel extents in horizontal direction in each row of zone. Step 3: Find the largest value in each row and alternate it with each foreground pixel in row. Step 4: Calculate the sum by thinking about the highest successive foreground pixel value of each row; it will be the featured value of that zone. Step 5: Zones having no foreground pixels are set to a value i.e. equals to zero.

144 2.4 Classification Classification is also an important phase of handwritten character recognition system. Classification phase is also known as decision making phase. This phase makes the uses of feature extracted within the previous phase, named as feature extraction phase. The main objective of classification phase is to classify the entered data. In this work, we have used RNN Classifier. Recurrent Neural Network architectures can have many distinct forms. One common type consists of a standard Multi-Layer Perceptron (MLP) plus added loops. These can exploit the powerful non-linear mapping capabilities of the MLP, and also have some form of memory. 3. Database Data is any raw material. It can be textual content, image, audio, video etc. after processing the data it turns into information. In Character recognition, we have used handwritten Gurumukhi characters. We have used Gurumukhi characters written by 70 persons. There are 35 basic characters in Gurumukhi. Therefore, total 70 35= 2450 samples have been collected. One set of 35 Gurmukhi characters written by one person have been shown in Figure 4. Figure 4: Sample of handwritten Gurumukhi characters 4. Experimental Results The results of character recognition system for offline handwritten Gurumukhi characters are provided here. These results are based on three feature extraction techniques, zoning intersection

145 and open end point and peak extent features and combination of these three features. We have divided the data set into different strategies. In the strategy a, we have used 50% data for training set and 50% data in the testing set. In strategy b, we have taken 60% data for training set and 40% data used for testing set. Strategy c has 70% data in training set and 30% data in testing set. For strategy d 80% data in training set and 20% in testing set. Strategy e has used 90% data in training set and remaining 10% data in testing set. 4.1 Recognition accuracy of Zoning feature extraction In Zoning feature extraction technique, we have acquired the results by calculating the 100 features of every character. 5 strategies are used and additionally accomplished 5 fold cross validation and 10 fold cross validation. So in this entire accuracy is achieved by taking the partition of the data. In zoning feature maximum accuracy achieved is 85.71% as shown in Figure 5. Figure 5: Recognition Accuracy of Zoning Feature Extraction technique. 4.2 Recognition accuracy of intersection and open end point feature extraction In this Intersection and open end point features are used for the input to RNN classifier. Maximum accuracy achieved is 83.91% in 10 fold cross validation as shown in Figure 6. 4.3 Recognition accuracy of peak extent feature extraction

146 The peak extent feature is extracted by considering the sum of the lengths of the peak extents. In this the maximum accuracy achieved is 73.46% using 10 fold cross validation as shown in Figure 7. 4.4 Recognition accuracy using combination of all three features Figure 8 shows the recognition accuracy by performing the combination of zoning feature(f1), intersection and open end feature (F2), peak extent feature (F3). We have achieved an accuracy of 87.22% using 10 fold cross validation in this case. Figure 6: Recognition Accuracy of intersection and open end point feature. Figure 7: Recognition Accuracy of peak extent feature.

147 Figure 8: Recognition Accuracy using combination of all three features (F1+F2+F3). Experiments have also been performed using F1 + F2, F1 + F3 and F2 + F3. The combined experimental results have been shown in Table 1 and Figure 9. One can see that F1 and F2 in combination are giving best accuracy of 87.34 for recognizing handwritten Gurumukhi characters using RNN. Table 1: Recognition Accuracy using various feature extraction techniques using RNN. Feature Extraction Technique Recognition Accuracy using RNN Zoning(F1) 85.71% Intersection and open end point(f2) 83.91% Peak Extent(F3) 73.46% F1+F2 87.34% F1+F3 83.34% F2+F3 84.89% F1+F2+F3 87.22%

148 Figure 9: Character Recognition Accuracy of Various Feature Extraction Techniques 5. Conclusions In this work, we have contributed our effort in proposing a process to recognize offline handwritten Gurumukhi characters. We have recognized the characters by using Recurrent Neural Network. Before recognition, feature extraction is performed. For feature extraction three features extraction techniques zoning features, horizontal peak extent features and intersection and open end points features are used. We have achieved 87.34% accuracy for Gurumukhi character recognition by using RNN classifier. We have also used the combination of feature extraction techniques which gives the better outcomes. 6. References [1] Graves A, Liwicki M, Bunke H, Schmidhuber J, Fernández S., Unconstrained on-line handwriting recognition with recurrent neural networks, In Advances in Neural Information Processing Systems, pp. 577-584, 2008. [2] Kumar, Munish Jindal, M. K. and Sharma, R. K., Classification of Characters and Grading Writers in Offline Handwritten Gurumukhi Script, International Conference on Image Information Processing, IEEE, pp. 1-4. 2011. [3] D. Sharma and D. Gupta, Isolated Handwritten Digit Recognition using Adaptive Unsupervised Incremental Learning Technique, International Journal of Computer Applications, Vol. 7, No.4, September 2010. [4] M.. Shahiduzzaman, Bangla Hand Written Character Recognition, International Journal of Science and Research (IJSR), pp 949-952, 2014. [5] R. Messina and J. Louradour, Segmentation-free handwritten Chinese text recognition with LSTM-RNN, 13 th International Conference on Document Analysis and Recognition (ICDAR), pp. 171 175, 2015. [6] Herekar, R. and Dhotre, S. R., Handwritten Character Recognition Based on Zoning Using Euler Number for English Alphabets and Numerals, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 16(4), pp. 75-88, 2014.

149 [7] Macwan, J. J., Goswami, M. M., & Vyas, A. N., A survey on offline handwritten north Indian script symbol recognition, In Proceedings of International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2747-2752, 2016. [8] A. Graves and J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, In advances in Neural Information Processing Systems, pp. 545 552, 2009.