Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Similar documents
DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Character Recognition Using Matlab s Neural Network Toolbox

Handwritten Devanagari Character Recognition Model Using Neural Network

Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

An Improved Zone Based Hybrid Feature Extraction Model for Handwritten Alphabets Recognition Using Euler Number

A Technique for Offline Handwritten Character Recognition

A Review on Handwritten Character Recognition

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

OCR For Handwritten Marathi Script

Opportunities and Challenges of Handwritten Sanskrit Character Recognition System

HCR Using K-Means Clustering Algorithm

Handwritten character and word recognition using their geometrical features through neural networks

Handwritten Hindi Character Recognition System Using Edge detection & Neural Network

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

LECTURE 6 TEXT PROCESSING

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Handwritten Numeral Recognition of Kannada Script

II. WORKING OF PROJECT

CHAPTER 1 INTRODUCTION

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

Neural Network Classifier for Isolated Character Recognition

A two-stage approach for segmentation of handwritten Bangla word images

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Handwritten Character Recognition with Feedback Neural Network

Handwritten Hindi Numerals Recognition System

Handwritten Character Recognition System using Chain code and Correlation Coefficient

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

A Simplistic Way of Feature Extraction Directed towards a Better Recognition Accuracy

Segmentation of Characters of Devanagari Script Documents

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

Convolution Neural Networks for Chinese Handwriting Recognition

Multi-Layer Perceptron Network For Handwritting English Character Recoginition

Extraction and Recognition of Alphanumeric Characters from Vehicle Number Plate

Devanagari Handwriting Recognition and Editing Using Neural Network

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

ADVANCES in NATURAL and APPLIED SCIENCES

An Investigation on the Performance of Hybrid Features for Feed Forward Neural Network Based English Handwritten Character Recognition System

Handwritten Character Recognition A Review

Paper ID: NITETE&TC05 THE HANDWRITTEN DEVNAGARI NUMERALS RECOGNITION USING SUPPORT VECTOR MACHINE

Neural network based Numerical digits Recognization using NNT in Matlab

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

International Journal of Advance Research in Engineering, Science & Technology

Short Survey on Static Hand Gesture Recognition

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Tamil Image Text to Speech Using Rasperry PI

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015

Optical Character Recognition

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features

Devanagari Isolated Character Recognition by using Statistical features

DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION

INTERNATIONAL RESEARCH JOURNAL OF MULTIDISCIPLINARY STUDIES

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network

Segmentation of Isolated and Touching characters in Handwritten Gurumukhi Word using Clustering approach

FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS

PCA-based Offline Handwritten Character Recognition System

Odia Offline Character Recognition

Image Compression: An Artificial Neural Network Approach

Spotting Words in Latin, Devanagari and Arabic Scripts

Recognition of Handwritten Digits using Machine Learning Techniques

Text Extraction from Images

OFFLINE SIGNATURE VERIFICATION

Optical Recognition of Digital Characters Using Machine Learning

A System towards Indian Postal Automation

Word-wise Hand-written Script Separation for Indian Postal automation

Khmer Character Recognition using Artificial Neural Network

Problems in Extraction of Date Field from Gurmukhi Documents

Character Recognition from Google Street View Images

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Segmentation of Bangla Handwritten Text

ISSN: [Mukund* et al., 6(4): April, 2017] Impact Factor: 4.116

Handwritten Script Recognition at Block Level

High Accuracy Arabic Handwritten Characters Recognition Using Error Back Propagation Artificial Neural Networks

Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks. Pilar Gómez-Gil, PhD ISCI 2012

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

International Journal of Advance Research in Engineering, Science & Technology

Keywords Handwritten alphabet recognition, local binary pattern (LBP), feature Descriptor, nearest neighbor classifier.

Hand Written Telugu Character Recognition Using Bayesian Classifier

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

A Technique for Classification of Printed & Handwritten text

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Machine Learning: Handwritten Character Recognition

Automatic Static Signature Verification Systems: A Review

Hand Writing Numbers detection using Artificial Neural Networks

A Neural Network Based Bank Cheque Recognition system for Malaysian Cheques

A Multimodal Framework for the Recognition of Ancient Tamil Handwritten Characters in Palm Manuscript Using Boolean Bitmap Pattern of Image Zoning

Transcription:

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida, India 2UG Scholar, Dept. of CSE, GCET, Greater Noida, India 3Assistant Professor, Dept. of CSE, GCET, Greater Noida, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract Cursive Handwriting recognition is a very challenging area due to the unique styles of writing from one person to another. Various researches have been conducted in this field since around four decades. In this paper, an offline cursive writing character recognition system is described using an Artificial Neural Network. The features of each character written in the input are extracted and then passed to the neural network. Data sets, containing texts written by different people are used to train the system. The proposed recognition system gives high levels of accuracy as compared to the conventional approaches in this field. This system can efficiently recognise cursive texts and convert them into structural form. Key Words : Image Processing, Feature extraction, Neural networks, Cursive handwriting, Classification. 1. INTRODUCTION Cursive writing recognition is one of the most challenging research areas in the field of image processing and pattern recognition. Each person has a different style of writing the same alphabet. There are also variations in the text written by the same person time to time. The development of cursive writing recognition systems has led to an improved interaction between man and machine. Various research works have been conducted focusing on new techniques that aim at reducing the processing time while providing higher accuracy. The first step in any handwritten recognition system is preprocessing followed by segmentation and feature extraction. Pre-processing is mainly essential to shape the input image into a form suitable for segmentation. In the segmentation, individual characters are separated and then, each character is resized into m x n pixels towards the training network. The most critical factor in achieving high recognition performance is the selection of appropriate feature extraction method. The methods like Template matching, Graph description, Projection Histograms, Zoning are widely used. An Artificial Neural Network is used in the back end for performing classification and recognition operation. In the off-line recognition system, the neural networks have emerged as the fast and reliable tools for classification towards achieving high efficiency. Some major classification techniques include statistical methods based on Bayes decision rule, Artificial Neural Networks (ANNs), Support Vector Machines (SVM) etc. This paper is organized into various sections. The Section 2 gives a brief overview of the existing methods that have been proposed in this area. The next, Section 3 describes the proposed approach based on multilayer feed forward neural network. Towards the last sections of this paper, we analyse the results of the proposed approach under various conditions and conclude the paper. 2. EXISTING METHODOLOGY Handwriting recognition can be of two types, off-line and online recognition methods. In the off-line approach, the input is obtained by scanning the text written on the paper using a pen/pencil in the form of an image. In the on-line system the two dimensional coordinates of successive points are represented as a function of time. The input is hence obtained by transducer devices like electronic tablets or digitizers. The online system works in real time but the offline approach can provide higher levels of accuracy in recognising the characters. There are a number of applications where these systems can be used effectively like mail sorting, bank processing, document reading and postal address recognition. Cursive handwriting recognition has been an area of interest of various researchers due to its applicability in easing a number of tasks of the real world. Notable contributions have led to development of systems which are extremely fast and efficient in recognizing the input texts. The recognition of cursive texts based on division of continuous characters in triplets was proposed in 1999. A word was segmented into triplets an subsequent triplets contained two common letters [11]. This concept of overlapping the characters was used to achieve higher recognition rates. A cross correlation matrix was maintained to track the connectivity between the symbols. A modified quadratic classifier based scheme [9] to recognize texts in six 2017, IRJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2202

different Indian scripts was proposed in 2007. In the same year Horizontal/Vertical strokes along with Zoning techniques were proposed which have reported high efficiency [8]. But the feature extraction process in this approach is complex and time consuming. The method also uses thinning process on the characters which leads to the loss of certain features. Neural Networks have proved to be an efficient tool for recognizing handwritten texts. In 2011, an approach based on the above was proposed but the characters had to be of a fixed size [5]. Feature extraction module was missing from this system rendering a low recognition accuracy. Another system based on hybrid Hidden Markov Model (HMM) was proposed in 2011 to recognize unconstrained offline texts[11]. The structural part of the optical model was modified and a Multilayer Perceptron was used to recognize the characters. An approach to recognize English characters was proposed in 2012. It was based on Fuzzy classification theory [1] where a membership function was used. This function was based on the coordinates (x,y) and the length of the character. The degree of similarity between the character and trained image was used to recognize the alphabet. A Back Propagation algorithm [4] using momentum item and role function was proposed in 2013 for cursive writing recognition. The approach had advantages like quick speed and higher recognition effect. 3. PROPOSED SYSTEM In this paper, a diagonal feature extraction scheme for the recognition of handwritten characters is proposed. An overview of the system can be taken from the block diagram in Figure 1. In the feature extraction process, resized individual character of size 90x 60 pixels is further divided into 54 equal zones, each of size 10x10 pixels. The features are extracted from the pixels of each zone by moving along their diagonals. This procedure is repeated for all the zones leading to extraction of 54 features for each character. These extracted features are used to train a feed forward back propagation neural network employed for performing classification and recognition tasks. The advantage of the above technique is that it requires lesser time to train the neural network. 3.1 Image Acquisition In Image acquisition, the recognition system acquires a scanned image as an input image. The image should have a specific format such as JPEG, BMT etc. The image is obtained through a scanner, digital camera or any other suitable digital input device. 3.2 Pre Processing A series of operations are performed on the scanned input image. It essentially enhances the image rendering it suitable for segmentation. The various tasks performed on the image in pre-processing stage are shown in Figure 2. Binarization process converts a gray scale image into a binary image using a threshold method. Detection of edges in the binarized image using sobel operator, dilating the image and filling the holes present in it are the operations performed in the last two stages to produce the pre-processed image suitable for segmentation. Slant correction is also done in this phase to correct the angle of the text. Fig - 1 : Block Diagram of Cursive Writing Recognition System Fig 2 : Stages of Pre - Processing 2017, IRJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2203

3.3 Segmentation In this stage, an image of sequence of characters is decomposed into sub-images of individual character. In the proposed system, the pre-processed input image is segmented into isolated characters by assigning a number to each character using a labelling process. This labelling provides information about number of characters in the image. Each individual character is uniformly resized into 90X60 pixels for classification and recognition stage. The feature vector is denoted as X where X = ( f1, f2,,fd ) where f denotes features and d is the number of zones into which each character is divided. 3.4 Feature Extraction In this stage, the features of the characters that are crucial for classifying them at recognition stage are extracted. This is an important stage as its effective functioning improves the recognition rate and reduces the misclassification. Diagonal feature extraction scheme for recognizing off-line handwritten characters is proposed in this work. Every character image of size 90x 60 pixels is divided into 54 equal zones, each of size 10x10 pixels. The chain codes are used to detect the directions as shown in Figure 3 in order to extract the features of a character. Fig 4 : Architecture of the Neural Network Fig 3 : Directional features of a character The features are extracted from each zone pixels by moving along the diagonals of its respective 10X10 pixels. Each zone has19 diagonal lines and the foreground pixels present long each diagonal line is summed to get a single sub-feature, thus 19 sub-features are obtained from the each zone. These 19 sub-features values are averaged to form a single feature value and placed in the corresponding zone. This procedure is sequentially repeated for the all the zones. There could be some zones whose diagonals are empty of foreground pixels. The feature values corresponding to these zones are zero. Finally, 54 features are extracted for each character. In addition, 9 and 6 features are obtained by averaging the values placed in zones rowwise and columnwise, respectively. As result, every character is represented by 69, features. 3.5 Classification and Recognition This is the decision making part of a recognition system and it uses the features extracted in the previous stage. A feed forward back propagation neural network as shown in Figure 4, having two hidden layers is used to perform the classification. The hidden layers use log sigmoid activation function, and the output layer is a competitive layer, as one of the characters is to be identified. The number of input neurons is determined by length of the feature vector d. The total numbers of characters n determines the number of neurons in the output layer. The number of neurons in the hidden layers is obtained by trial and error. The most compact network is chosen and presented. 4. EXPERIMENT AND RESULTS The proposed system has been implemented using Matlab. The scanned image is taken as dataset/ input and feed forward architecture is used. The structure of neural network includes an input layer with 54 inputs, two hidden layers each with 100 neurons and an output layer with 26 neurons. The network is trained using the gradient descent back propagation method with momentum and adaptive learning rate and log-sigmoid transfer function. Neural network has been trained using known dataset. A recognition system using two different feature lengths is built. The number of input nodes is chosen based on the number of features. After training the network, the recognition system was tested using several unknown dataset and the results obtained are analysed here. Three different ways of feature extraction are used for character recognition in the proposed system ie. horizontal direction, vertical direction and diagonal direction. The feature vector size is chosen as 54, 2017, IRJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2204

i.e. without rowwise and columnwise features. The results obtained using three different types of feature extraction are summarized in Table 1. 5. CONCLUSION A simple off-line cursive character recognition system using a new type of feature extraction, namely, diagonal feature extraction is proposed. The Neural Network used to recognise the characters is built using 54 features. To compare the recognition efficiency of the proposed diagonal method of feature extraction, the neural network recognition system is trained using the horizontal and vertical feature extraction methods, six different recognition networks are built. From the test results it is identified that the diagonal method of feature extraction yields the highest recognition accuracy of upto 97%. The diagonal method of feature extraction is verified using a number of test images. The proposed off-line hand written character recognition system with better quality recognition rates will be eminently suitable for several applications including postal/parcel address recognition, bank processing, document reading and conversion of any handwritten document into structural text form. 6. REFERENCES Fig 5 : Input Text and Final Result The criteria for choosing the type of feature extraction are: (i) the speed of convergence, i.e. number of epochs required to achieve the training goal and (ii) training stability. However, the most important parameter of interest is the accuracy of the recognition system. A sample input image being converted into the desired result ie. recognised characters has been shown in Figure 5. The results presented in Table 1 show that the diagonal feature extraction yields good recognition accuracy compared to the others types of feature extraction. The desired performance goal has been achieved in 923 epochs. Table 1 : Comparison of Recognition Rates obtained with Different Orientations [1] M. N. Mangoli and Prof. S. Desai, Optical Character Recognition for Cursive Handwriting, International Research Journal of Engineering and Technology, Vol. 3 No. 5, pp. 792 795, 2016. [2] N. yadav and P. Yadav, Handwriting Recognition System A Review, International Journal of Computer Applications, Vol. 114 No. 19, pp. 36 40, 2015. [3] M. Patel and S. P. Thakkar, Handwritten Character Recognition in English A Survey, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4 No. 2, pp. 345 350, 2015. [4] B. Kumar, N. Sharma and T. Patnaik, Recognition for Handwritten English Characters : A Review,International Journal of Engineering and Innovative Technology, Vol. 2 No. 7, pp. 318 321, 2013. [5] Dr. S. Upadhyay and P. N. Tushar, Chain Code Based Handwritten Cursive Character Recognition System with Better Segmentation using Neural Network, International Journal of Computational Engineering Research, Vol. 3 No. 5, pp. 60 63, 2013. [6] A. Bhushan, J. Supriya and P. Kalyani, Handwritten Script Recognition, IOSR Journal of Computer Engineering, pp. 30 33, 2012. [7] A. Pall and D. Singh, Handwritten English Character Recognition Using Neural Network, International Journal of Computer Science and Communication, Vol. 1 No. 2, pp. 141 144, 2010. [8] D. Acharya, N. V. Reddyand Krishnamurthy, Isolated Handwritten Kannada Numeral Recognition using Structural feature and K means Cluster, International Journal of Information Technology and Knowledge Management, pp. 125 129, 2009. [9] F. Kimura, T. Wakabayashi, U. Pal, Handwritten Numeral Recognition of six Popular Scripts, Ninth International Conference on Document Analysis and Recognition, Vol. 2, pp. 749 753, 2007. 2017, IRJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2205

[10]G. D. Salunke and S. B. Hallale, Twelve Directional Feature Extraction for Handwritten English Character Recognition, International Journal of Recent Technology and Engineering, Vol. 2 No. 2, 2013. [11]G. Katiyar and S. Mehfuz, Intelligent Systems for Offline Handwritten Character Recognition : A Review, International Journal of Emerging Technology and Advanced Engineering, 2012. [12]D. K. Patel, M. K. Singh and T. Som, Improving the Recognition of Handwritten Characters using Neural Network through Multiresolution Technique and Euclidean Distance Metric, International Journal of Computer Applications, Vol. 45 No. 6, 2012. [13] A. Aparna and I. Muthumani, Optical Character Recognition for Handwritten Cursive English Characters, International Journal of Computer Science and Information Technology, Vol. 5 No. 1, pp. 847 848, 2010. [14]B. B. Choudhary and U. Bhattacharya, Handwritten Numeral Databases of Indian Scripts and Multi stage Recognition of Mixed Numerals, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 31 No. 3, pp. 444 457, 2009. [15]H. Bunke, E. Schukat and M. Roth, Offline Cursive Handwriting Recognition using Hidden Markov Model, Elsevier Science Limited, Vol. 28 No. 9, 1995. 2017, IRJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2206