Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network

Similar documents
An Efficient Character Segmentation Based on VNP Algorithm

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

Handwritten Devanagari Character Recognition Model Using Neural Network

Segmentation of Characters of Devanagari Script Documents

OCR For Handwritten Marathi Script

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

A Review on Handwritten Character Recognition

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

International Journal of Advance Engineering and Research Development OPTICAL CHARACTER RECOGNITION USING ARTIFICIAL NEURAL NETWORK

Handwritten Character Recognition with Feedback Neural Network

Handwritten Hindi Numerals Recognition System

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Segmentation of Isolated and Touching characters in Handwritten Gurumukhi Word using Clustering approach

Character Recognition Using Matlab s Neural Network Toolbox

Handwritten Hindi Character Recognition System Using Edge detection & Neural Network

HCR Using K-Means Clustering Algorithm

A Document Image Analysis System on Parallel Processors

Neural Network Classifier for Isolated Character Recognition

Image Compression: An Artificial Neural Network Approach

FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Preprocessing of Gurmukhi Strokes in Online Handwriting Recognition

Mobile Application with Optical Character Recognition Using Neural Network

Neural network based Numerical digits Recognization using NNT in Matlab

Character Recognition

A Technique for Classification of Printed & Handwritten text

Handwritten Script Recognition at Block Level

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Handwritten character and word recognition using their geometrical features through neural networks

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Review on Optical Character Recognition and Signature Recognition and Verification Technique

Handwriting Recognition of Diverse Languages

Offline Signature verification and recognition using ART 1

A Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Keywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques

ISSN: [Mukund* et al., 6(4): April, 2017] Impact Factor: 4.116

LECTURE 6 TEXT PROCESSING

ABSTRACT I. INTRODUCTION. Dr. J P Patra 1, Ajay Singh Thakur 2, Amit Jain 2. Professor, Department of CSE SSIPMT, CSVTU, Raipur, Chhattisgarh, India

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

Handwritten Devanagari Character Recognition

II. WORKING OF PROJECT

Handwritten English Alphabet Recognition Using Bigram Cost Chengshu (Eric) Li Fall 2015, CS229, Stanford University

Multi-Layer Perceptron Network For Handwritting English Character Recoginition

Recognition of Handwritten Digits using Machine Learning Techniques

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

In this assignment, we investigated the use of neural networks for supervised classification

CHAPTER 1 INTRODUCTION

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Universal Graphical User Interface for Online Handwritten Character Recognition

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Finger Print Enhancement Using Minutiae Based Algorithm

A Multimodal Framework for the Recognition of Ancient Tamil Handwritten Characters in Palm Manuscript Using Boolean Bitmap Pattern of Image Zoning

OFFLINE SIGNATURE VERIFICATION

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Solving Word Jumbles

Text Extraction from Images

Slant Correction using Histograms

Comparing Tesseract results with and without Character localization for Smartphone application

Problems in Extraction of Date Field from Gurmukhi Documents

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Motion Detection Algorithm

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Segmentation of Bangla Handwritten Text

A SURVEY ON WORD SPOTTING TECHNIQUES FOR DOCUMENT IMAGE RETRIEVAL

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Text Recognition in Videos using a Recurrent Connectionist Approach

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015

A two-stage approach for segmentation of handwritten Bangla word images

Classification of Printed Chinese Characters by Using Neural Network

A Neural Network Based Bank Cheque Recognition system for Malaysian Cheques

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

International Journal of Advance Research in Engineering, Science & Technology

Layout Segmentation of Scanned Newspaper Documents

International Journal of Signal Processing, Image Processing and Pattern Recognition Vol.9, No.2 (2016) Figure 1. General Concept of Skeletonization

Extraction and Recognition of Alphanumeric Characters from Vehicle Number Plate

Handwritten Character Recognition System using Chain code and Correlation Coefficient

A Technique for Offline Handwritten Character Recognition

Khmer Character Recognition using Artificial Neural Network

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text

Simulation of Back Propagation Neural Network for Iris Flower Classification

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Minutiae Based Fingerprint Authentication System

Anale. Seria Informatică. Vol. XVII fasc Annals. Computer Science Series. 17 th Tome 1 st Fasc. 2019

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

OPTICAL CHARACTER RECOGNITION FOR VIETNAMESE SCANNED TEXT

Spotting Words in Latin, Devanagari and Arabic Scripts

Transcription:

International Journal of Emerging Engineering Research and Technology Volume 4, Issue 6, June 2016, PP 38-46 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network ABSTRACT *Address for correspondence: singla.parul33@gmail.com Parul Singla M.Tech (CSE) Scholar, Manav Rachna University (Formerly MRCE) Faridabad, India Sargam Munjal Assistant Professor, Dept. of CST, Manav Rachna University (Formerly MRCE) Faridabad, India The choice of techniques used to segment the word into character and extract the features are the main factors to judge the capability and the recognition accuracy of a Hand Written Character Recognition System. Character Recognition is the important stage in image processing applications such as OCR, License Plate Recognition, Electronic processing of checks in banks, form processing, and label and barcode recognition. The main focus of this work is to use VNP algorithm to segment the preprocessed hand written image into characters. Features are extracted by binarization. The recognition of hand written character image has been done by using multi-layered feed forward artificial neural network. Very promising results are achieved when VNP segmentation algorithm and binarization features and the multilayer feed forward neural network is used to recognize the handwritten characters. Keywords: Handwritten Character Recognition, Preprocessing, Segmentation, VNP, Feature Extraction, Recognition, BPNN. INTRODUCTION Humans are best at recognizing faces and complex patterns. Although it is clear that people are good at recognizing patterns, it is not at all obvious how patterns are coded or decoded by a human brain. Handwritten Character Recognition is an important area in image processing and pattern recognition field. It is a wide field that covers all sort of character recognition via machine in various application domains. The goal of this area of pattern recognition is to translate human readable characters to machine readable characters. Today, we have automatic character recognizers that help humans in variety of practical and commercial applications. Handwritten characters are non-uniform in nature, as a particular character can be written in different styles and sizes by different writers and even the same writer can write the same character in different styles at different times. Handwritten characters are also vague in nature as there may not be smooth curves or perfectly straight lines all the time. For character recognition the first step involves the preprocessing of the input images. This step is to enhance the characters and the text i.e. if there is any distortion or noise, reduction of the same in the image acquired will be done in this step. Either an input pattern is identified as a member of the predefined class or the pattern is assigned an unknown class if its class is not already stored in the database. The ability of training and identifying is converted into machine systems using the Artificial Neural Networks. The basic function for the hand written character recognition system is to compare the input text which is to be recognized with the characters already stored in the database and it recognizes the best matching character as output even at different writing styles and sizes. Hand written character recognition is a challenging task as text written by different people is varied in their styles. We have explained a neural network based approach for efficient hand written character recognition. Pre-Processing, Segmentation, Feature Extraction and Back Propagation Algorithm are the steps in the implementation. International Journal of Emerging Engineering Research and Technology V4 I6 June 2016 38

RELATED WORK The offline character recognition is an active area of research these days. As compared to machine printed character recognition, the work done by the researchers in the area of handwritten character recognition is very limited as mentioned by Apurva A.Desai [5]. In 2002, Kundu & Chen [6] used HMM to recognize 100 postal words and reported 88.2% accuracy. Binod Kumar [6] uses HMM model on the English character alphabets and gets 93.24% accuracy. For the letters A and W, the recognition rate is found to be very low, because of a lot of variations in writing style of these letters. Gunjan et. Al [1] uses the back propagation algorithm for Hindi character recognition. Each character is entered as an input vector of 49X1. Total 1000 (200 for each character) samples were used for classification. Out of these 1000 samples 60 % were used for training, 20% for validation and rest 20% were used for testing. Richard [4] uses the dissection technique of segmentation. By dissection is meant the decomposition of the image into a sequence of sub images using general features segmentation is a complex process, and there is a need for a term such as "dissection" to distinguish the image-cutting sub process from the overall segmentation, In their formulation the segmentation stage consisted of three steps: 1. Detection of the start of a character. 2. A decision to begin testing for the end of a character (called sectioning). 3. Detection of end-of-character. S. Chitrakala [2] proposed an algorithm called VNP (Visited Neighbor Pixel) algorithm, which is an improvement over vertical projection profile and it works by segmenting adjacent characters with minimum threshold based on connectedness of the pixels. PROPOSED METHODOLOGY We will discuss Segmentation, feature extraction and back propagation neural network in this implementation. Preprocessing relates to the removal of noise and variation in handwritten word patterns. Preprocessing may itself be broken down into smaller tasks such as noise removal, thinning, edge detection, resizing etc to enhance the quality of images and to correct distortion. Segmentation decomposes an image of sequence of characters into sub images of individual character. Fig1. Proposed Methodology 39 International Journal of Emerging Engineering Research and Technology V4 I6 June 2016

For segmenting the image of sequence of characters, we have used VNP (Visited Neighbor Pixel) algorithm which is an improvement over vertical projection profile, and classified under dissection based segmentation approaches, wherein the image is split up into meaningful components. Algorithm that follows in next section will explain visited neighbor pixel algorithm more clearly. VNP Algorithm We use the Visited Neighbor Pixel (VNP) algorithm in the segmentation. Segmentation It is an operation that decomposes an image of sequence of characters into sub images of individual character. Fig2. Segmentation Example S.Chitrakala [2] proposed an algorithm called VNP (Visited Neighbor Pixel) algorithm, which is an improvement over vertical projection profile and it works by segmenting adjacent characters with minimum threshold based on connectedness of the pixels. The proposed VNP method is classified under dissection based segmentation approaches, wherein the image is split up into meaningful components. Algorithm: Visited neighbor pixel algorithm Functions: STORE (x,y):stores the pixel corresponding to location(x,y) visited (x,y): Checks if the pixel (x,y) has been visited height (w): Returns the height of the word w value (x,y): Returns the value of the pixel (0-black or 1- white) Input: w: Segmented word Output: xmin: Left most point of the character xmax: Right most point of character ymin: Top most point of character ymax: Bottom most point of character Start For word w //Storing first pixel of character For x = 1 For-Each row y // black pixel encountered If value (x,y) == 0 // Start of the character Then STORE (x,y) xmin = x End If International Journal of Emerging Engineering Research and Technology V4 I6 June 2016 40

For column x = 2: n For-Each e (x,y) == 0 //Checking for neighbor black pixel If (!value (x!1, y)! value(x! 1, y ± 1)! value (x, y ± 1)) If (visited (x!1, y) visited (x, y ± 1)) = = 0 ) STORE (x, y) Else //Absence of neighbor pixels xmax = x End If // Computing boundaries of bounding box yn = height (w) For-Each row y from column If (x, y) = = 0 break STORE (x,y) ymin = y For y = yn; y>1; y-1 If ((x, y) = = 0&& (xmax + 1, y) 0) ymax = y End If End In the proposed algorithm the segmented a black pixel is detected. This pixel marks the beginning of the character as depicted in Fig. 3a and it is temporarily stored in an otherwise empty image of the same size and width as the segmented word. The scan progresses iteratively, checking for the presence of black pixels neighboring the already visited black pixels. The end of the character is determined as the column where no more neighboring black pixels exist. Fig. 3a Fig. 3b Fig. 3c Difficulty arises in identifying the bounds for separating the characters. The VNP algorithm proves to be advantageous in such cases, by checking for the condition of adjacency with the visited pixels. This row y If value gives a clear indication of segmentation as shown in Fig. 3b. After estimating the start and end point of a character, the columns corresponding to the starting and the ending point are 41 International Journal of Emerging Engineering Research and Technology V4 I6 June 2016

used as reference lines and scanning is done horizontally from the top to the bottom. When a black pixel is encountered, the corresponding scan line is marked as the top boundary of the character and then scanning continues horizontally from the bottom boundary of the word to the top. Every time a pixel is encountered, the presence of its adjacent pixels is checked. If it has adjacent pixels falling outside the right boundary of the character, the pixel is assumed to belong to another character and is ignored. This solves the problem of segmenting character when pixels of adjacent characters fall on the same scan line. The scanning proceeds till a pixel which has either no adjacent pixels or whose adjacent pixels fall within the character s boundary is encountered. The row corresponding to the pixel is marked as the bottom boundary as shown in Fig. 3c. Hence, the boundaries of the character are defined and the character is segmented along them. BPNN Back Propagation Neural Network (BPNN) is a multilayered, feed forward Neural Network (NN). BPNN consists of an input layer, one or more hidden layer and an output layer. The layers contain identical computing nodes called neurons which are connected in such way that the output neuron in one layer sends signal to the input layer of every neuron in the next layer. The input layer of the network serves as the signal receptor while the output layer passes out the result from the network. For Hand Written Character recognition we used a three layer BPNN as the classifier, the number of nodes in the input layer is equal to the dimension of the feature vector that characterizes the character image space. The number of the nodes in the hidden layer is set by trial and error method during training. The number of nodes in the output layer is equal to the number of the images in the database. During the training process, the BPNN learning algorithm adjusts the weights and the bias of each of the neurons in order to minimize the Mean Square Error (MSE) between the targets and predicted output. In the recognition phase, the features from the query character image that is to be tested is fed to the neural network without giving any target output. BPNN testing algorithm finds the closest pattern matching using the weights and the thresholds that have been stored and provides the corresponding recognized character. EXPERIMENTS AND RESULTS The image of handwritten text is preprocessed and segmented into individual characters and are recognized by a neural network. In the preprocessing phase, the input image of handwritten character in.bmp format from the local Database is converted to grayscale format by using rgb2gray function of MATLAB shown in Fig 4. Binarization is an important image processing step in which the pixel values are separated into two groups; white as background and black as foreground. Only two colors, white and black, can be present in a binary image. It is assumed that the intensity of the text is less than that of background i.e. the input image has black foreground pixels and white background pixels. The colors can be inverted if the input image has text intensity more than that of background. After the binarization of the text image, binarized image is used for segmenting the words into characters. Segmented characters are shown in the Fig 5. Fig4. Preprocessed Image International Journal of Emerging Engineering Research and Technology V4 I6 June 2016 42

Fig5. Segmented Word into Characters Now, binarized segmented character is resized into 7*5 matrix where 0 indicates the presence of white pixel and 1 represents the black pixel. This binary matrix of size 7*5 is then reshaped in a row first manner to a binary matrix of size 35*1 by using reshape function of MATLAB. Now, resulted matrix is used to train the neural network. TESTING RESULT OF HAND WRITTEN CHARACTER USING BPNN The size of the input layer depends on the size of the sample presented at the input and the size of the output layer is decided in accordance with the number of output classes in which each of the input patterns is to be classified. In the proposed experiment, the feature vector of each of the 26 character images is of size 35 1. Hence, 35 neurons are used in the input layer and 10 neurons are used in the output layer of the neural network. In case of back-propagation neural networks, universally accepted cost function to measure the generalization performance is MSE. The lower value of the cost function indicates that the neural network is capable of mapping the input and output in a right manner. The acceptable threshold for the MSE (cost function value) has been selected as 0.001 and the training of the neural network will come to an end when the error becomes less than or equal to this threshold value. The performance value indicates the extent of training of the network. A low performance value (36) indicates that the network has been trained properly. Fig6. Training performance of the Network The performance of the neural network classifier also depends on the number of training iterations required to train the network. Too less number of training epochs result in a poorly trained network due to under-fitting of the network. On the other hand, too many training epochs result in poor generalization due to over-fitting of the network. 43 International Journal of Emerging Engineering Research and Technology V4 I6 June 2016

Fig7. Performance Graph The network learning iterations must be selected in such a way that the network may converge properly with least generalization error. The maximum allowed epochs for the training process has been set to 100000 as shown in Fig 7. If the network could not converge within the maximum allowed epochs count, the training will stop. The result obtained by Segmentation algorithm is as shown in TABLE I. TABLEI. Segmentation Result Segmented alphabets Alphabets Percentages Correctly a, d, e, f, h, k, l, m, n, r, s, t, 0, 1, 3, 6, 8, 9 100% Incorrectly b, c, g, i, j, o, p, q, u, v, w, x, y, z, 2, 4, 5, 7 - In the proposed handwritten character recognition system, the neural network has been trained by each of the 26 characters image samples from the database have been involved in the learning process. Recognition rate between the various characters has been varied. Fig8. Hand Written Character Recognition System a is classified as e, 2 times and o, 5 times respectively. CONCLUSION Recognition of characters and learning of image processing techniques is done in this thesis. In this thesis we implemented Handwritten Character Recognition System using Back Propagation Neural Network. The handwritten character recognition is indeed a tough task which can be done with the methodology described here. Here the dataset taken is handwritten sentences and these are segmented International Journal of Emerging Engineering Research and Technology V4 I6 June 2016 44

into the characters using the VNP algorithm, through which we have to extract the characters. Extracted features are fed into the back propagation neural network and tested with the stored database to recognize the characters. ACKNOWLEDGMENT I would like to express my deep and sincere gratitude to Ms. Sargam Munjal, Assistant Professor, Department of Computer Science and Technology, MRU (formerly MRCE), Faridabad for her continual encouragement, motivation for literature and discussing the problems in area of Hand Written Character Recognition. REFERENCES [1] Gunjan Singh, Sushma Lehri, Recognition of Handwritten Hindi Characters using Back propagation Neural Network, International Journal of Computer Science and Information Technologies, Vol. 3,(2012):4892-4895. [2] S. Chitrakala, Srivardhini Mandipati, S. Preethi Raj and Gottumukkala Asisha, An Efficient Character Segmentation Based on VNP Algorithm, Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 [3] Shalin A. Chopra, Amit A. Ghadge, Onkar A. Padwal, Karan S. Punjabi, Prof. Gandhali S. Gurjar, Optical Character Recognition, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 3, (2014):4956-4958. [4] Richard G. Casey and Eric Lecolinet, A SURVEY OF METHODS AND STRATEGIES IN CHARACTER SEGMENTATION. [5] Desai, A.A., 2010, Gujrati handwritten numeral optical character recognition through Neural Network, Pattern Recognition, 43, pp no-2582-2589. [6] Kundu, Y.H., Chen, M.,2002, Alternatives to variable duration HMM in handwritten recognition, IEEE Trans Pattern Anal Mach Intell,20(11),pp no-1275-1280. [7] P. E. Ross, Flash of genius, Forbes, (1998):98-104. [8] J. Pradeepa,*, E. Srinivasana, S. Himavathi, Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten, International Journal of Engineering, Vol. 25, (2012):99-106. [9] Majed Valad Beigi, Handwritten Character Recognition Using BP NN, LAMSTAR NN and SVM, Northwestern University, SPRING 2015. [10] Amit Choudhary, A Review of Various Character Segmentation Techniques for Cursive Handwritten Words Recognition, International Journal of Information & Computation Technology, Volume 4(2014):559-564. 45 International Journal of Emerging Engineering Research and Technology V4 I6 June 2016