II. WORKING OF PROJECT

Similar documents
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

Handwritten Character Recognition with Feedback Neural Network

A two-stage approach for segmentation of handwritten Bangla word images

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

OCR For Handwritten Marathi Script

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

A Technique for Classification of Printed & Handwritten text

Multi-Layer Perceptron Network For Handwritting English Character Recoginition

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

HCR Using K-Means Clustering Algorithm

Segmentation of Characters of Devanagari Script Documents

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

IMPLEMENTING ON OPTICAL CHARACTER RECOGNITION USING MEDICAL TABLET FOR BLIND PEOPLE

NEW ALGORITHMS FOR SKEWING CORRECTION AND SLANT REMOVAL ON WORD-LEVEL

Strategic White Paper

Segmentation of Bangla Handwritten Text

An Efficient Character Segmentation Based on VNP Algorithm

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

A Document Image Analysis System on Parallel Processors

Human Performance on the USPS Database

Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network

International Journal of Advance Research in Engineering, Science & Technology

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Character Recognition of High Security Number Plates Using Morphological Operator

Handwritten Devanagari Character Recognition Model Using Neural Network

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Handwriting Recognition of Diverse Languages

Hidden Loop Recovery for Handwriting Recognition

A New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Preprocessing of Gurmukhi Strokes in Online Handwriting Recognition

A New Algorithm for Detecting Text Line in Handwritten Documents

Robust PDF Table Locator

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques

A Review on Handwritten Character Recognition

Character Recognition Using Matlab s Neural Network Toolbox

INTERNATIONAL RESEARCH JOURNAL OF MULTIDISCIPLINARY STUDIES

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

CHAPTER 1 INTRODUCTION

Seminar. Topic: Object and character Recognition

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

Devanagari Handwriting Recognition and Editing Using Neural Network

Classification of Printed Chinese Characters by Using Neural Network

Mobile Application with Optical Character Recognition Using Neural Network

A Neural Network Based Bank Cheque Recognition system for Malaysian Cheques

LECTURE 6 TEXT PROCESSING

Problems in Extraction of Date Field from Gurmukhi Documents

Handwritten Hindi Character Recognition System Using Edge detection & Neural Network

Slant Correction using Histograms

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

Coarse-to-fine image registration

Hilditch s Algorithm Based Tamil Character Recognition

Recognition of Unconstrained Malayalam Handwritten Numeral

Neural Network Classifier for Isolated Character Recognition

Extraction and Recognition of Alphanumeric Characters from Vehicle Number Plate

Universal Graphical User Interface for Online Handwritten Character Recognition

Feature Sets Evaluation for Handwritten Word Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

Character Recognition

Artifacts and Textured Region Detection

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique

Off-line Signature Verification Using Neural Network

Cursive Handwriting Recognition

ISSN: [Mukund* et al., 6(4): April, 2017] Impact Factor: 4.116

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction

Skew Detection and Correction of Document Image using Hough Transform Method

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text

Word-wise Hand-written Script Separation for Indian Postal automation

Instantaneously trained neural networks with complex inputs

A Simple Text-line segmentation Method for Handwritten Documents

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Online Handwritten Devnagari Word Recognition using HMM based Technique

LEKHAK [MAL]: A System for Online Recognition of Handwritten Malayalam Characters

Keywords Connected Components, Text-Line Extraction, Trained Dataset.

Handwritten character and word recognition using their geometrical features through neural networks

Robust line segmentation for handwritten documents

Signature Recognition by Pixel Variance Analysis Using Multiple Morphological Dilations

Text Extraction from Images

Handwritten Script Recognition at Block Level

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Scene Text Detection Using Machine Learning Classifiers

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

PCA-based Offline Handwritten Character Recognition System

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

Handwriting segmentation of unconstrained Oriya text

Transcription:

Handwritten character Recognition and detection using histogram technique Tanmay Bahadure, Pranay Wekhande, Manish Gaur, Shubham Raikwar, Yogendra Gupta ABSTRACT : Cursive handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, postal address recognition, reading machines for the blind, bank check recognition, and interpretation of historical documents Therefore, in the last few decades, researchers have put an enormous effort into developing various techniques for handwriting recognition This chapter reviews existing handwriting recognition techniques and presents the current state of the art in cursive handwriting recognition The chapter also presents segmentation strategies and a segmentation-based approach for automated recognition of unconstrained cursive handwriting The chapter provides a comprehensive literature review with basic and advanced techniques and research results in handwriting recognition for graduate students as well as for advanced researchers Index terms - Cursive Script, Detection, Extraction, Histogram I INTRODUCTION In Current Scenario, Handwritten Character Detection is a hectic work To recognize the historical documents and notes, current approach is quite not evaluative and considerable Most of the Current Recognition Techniques does not work on Raw Images and Line Extraction Segmentation is the most crucial part of the cursive handwritten recognition problem, which is used for the recognition Here we are proposing the new System for Handwritten Character Detection In our Project; we are using Histogram Technique for Scanned Hand written Images This project in many Levels For this; we need to create the database which includes the set of different styles of handwritten characters ie alphabets Handwritten character detection, partition and Recognition implemented by using the Histogram technique This Technique is User Friendly to Use and Effective in Manner [1][2] II WORKING OF PROJECT First we will take the Scanned Image of the Document By using the Gray Scale Technique; we will reverse the Input Format of Image For Ex: if it is a Colourful Image, then we will convert it into Black and White Image Then we will reverse the Mode of Image ie if Background White and Fore Colour Black then after reversing ; Background = Black Fore Colour = White Now this is our Proper Input of Image for Further Line and Word Extraction [3][4][6][7] Working of proposed system can be divided into 6 sections- 1 Reversal of Image Mode 2 Creating A Data- Base 3 Line Extraction and Separation 4 Word Extraction and Separation 5 Character Extraction 6 Word Detection and Output 1 Reversal of Image Mode In this Module; we will change the Mode of scanned image as completely explain Earlier This will enhance the Visibility of the Image By doing this, back ground and Fore Colour of Image White and Black will be Interchanged Respectively [4] 63

2 Creating a Data-Base In this Module; we will create a set of Data base Data base of hand written characters ie alphabets a-z in various different Styles and manners like the handwriting of different persons Every Alphabets which is in database will have a ASCII value Fig1 Alphabets in different forms 3 Line Extraction and Separation For This Module; we will be using histogram Graph Technique This technique will be applied as a whole of the Image This will form Peak valley type Graph of each and every line of Scanned Image The peak Valleys will indicate number of Lines in Scanned Image [4] Fig2 Original scanned document Fig3 Line Extraction and Separation 4 Word Extraction and Separation After Line Extraction, we will perform the Word Extraction Here also we will use histogram Graph Technique It will be having 1 s and 0 s values The peak Valleys will indicate number of Words in Scanned Image 64

Fig4 Word Extraction and Separation 5 Character Extraction After Word Extraction; we will perform the Character Extraction By comparing the Characters of the Word with the own constructed Hand written Data base It will give us the recognized Extracted Character of the Input Word Fig5 Character Extraction 6 Word Detection and Output After performing Character Extraction and Detection; we will perform this In this module; we will get the corrected proper format or our output as a complete Word Proper Format Input will give us Proper Output [7] III METHODOLOGY A scanner is used to scan the handwritten English alphabets written on a piece of A4 size paper Each character is enclosed in a box of size 80 x 80 in jpeg format The image is converted into a binary matrix of size 80 x 80 Each matrix contains a large number of elements which conveys no information about the pattern In order to eliminate such elements 80 x 80 matrix is compressed into a matrix of size 10 x 10 The 10 x 10 matrix is segmented column-wise into 10 segments of size 10 each All the columns of a particular character are mapped into identical patterns used to recognize that particular character In this way standard weight matrix is obtained for each character Test characters are presented to the trained net A counter is used to find out the number of identical patterns produced by the test character Majority of the identical pattern present in a test character decides the existence of a particular character [5] A Compression of 80 x 80 matrix into 10 x 10 matrix A matrix can be compressed into a matrix of lower dimension in order to reduce the non significant elements of the matrix Matrix A can be compressed into matrix A_COM by using the function Ф Matrix A can be split into n uniform blocks The dimension of each block A_BLOCKi is m x m B Column-wise Segmentation of 10 x 10 Matrix The 10 x 10 matrix is segmented column-wise into 10 columns of size 10 each Each column is mapped into 10 identical patterns used to recognize a character Weight matrix is initialized to zero Perceptron learning rule is used to train the net 1 Perceptron Learning Rule In a simple perceptron one or more than one neurons are connected to a single neuron with the help of weights, (Fig7) Weights may be initialized to 0 or very small values An iterative procedure is used to find out the standard weights At standard weights the net generates correct outputs 65

x1i w1ij y1j x1i w1ij y1j xi wij yj x2i X2i w2ij w2ij y2j y2j Fig7 A simple perceptron having 3 inputs and one output INPUT VECTOR x10i w10ij y10j COMPRESSED INPUT VECTOR x10i w10ij y10j CSMI NEURAL NET Fig6 CSIM Neural Net, each block contains 10 inputs and 2 outputs STANDARD WEIGHT Fig8 Conceptual Diagram of CSIM training C Training of the net to obtain the weight matrix The training started with an input vector of size [80x 80 = 6400], compressed into a vector of size [10x 10 = 100], (Fig8) First four characters of the English alphabets are considered for training, (Fig8) The weights are initially set to zero The net, (Fig6) is trained using Perceptron Learning Rule After training the net for few epochs, the final weight that generates the correct output is obtained The final weight obtained remains the same for the next few epochs and is considered as the standard weight for testing Fig9 Alphabets considered for CSIM training 66

D Testing of the net The net is tested with first four characters of English alphabets, deviated to the extent, at which the deviated character can be identified, (Figure 10) The testing starts with a [80 x 80 = 6400] vector compressed into a [10 x 10 =100] vector on the net as shown in Fig1 TEST VECTOR COMPRESSED TEST VECTOR CSIM NEURAL NET STANDARD WEIGHT Fig10 Alphabets considered for CSIM testing OUTPUT PRODUCED COUNTER IDENTIFIED CHARACTER The compressed vector is presented to the net having the standard weight matrix Since, there are two outputs, input patterns are mapped as either [-1,-1], [-1, 1], [1, -1] or [1,1] to identify the characters A, B, C or D respectively A counter is used to count the number of identical pattern outputs from the ten different segments for confirmation A value greater than or equal to six is considered for the identification of a particular character, (Fig11) Example Let us consider four characters of English alphabet A, B, C and D A can be represented by a 10 x 10 matrix shown below * Fig11 Conceptual Diagram of CSIM testing 67

In the above matrix the blank space can be considered as -1 and * can be considered as 1 So, the vector representation of the matrix A in column major form is represented as: A={-1,-1,-1,-1,-1,1,1,1,1,1,-1,-1,-1,-1,1,-1,1,-1,-1,-1,-1,-1,-1,1,-1,-1,1,1,1,1,-1,-1,1,-1,-1,-1,1,-1,-1,-1,-1,1,-1,-1, -1,-1,1,-1,-1,-1,1,-1,-1,-1,-1,-1,1,-1,-1,-1,-1,1,-1,-1,-1,1,-1,-1,-1,-1,-1,1,-1,-1,-1,1,-1,-1,-1,-1,-1,-1,1,-1,-1,1,-1,-1, -1,-1,-1,-1,-1,1,1,1,1,1,1} A can be compressed to form a matrix of dimension 5 x 5 by applying compression algorithm * * A_COM={-1,-1,1,1,1,-1,1,1,-1,-1,1,-1,1,-1,-1,1,1,1,-1,-1,-1,1,1,1,1} Similarly, compression algorithm is applied on B, C and D to form B_COM, C_COM and D_COM IV APPLICATIONS This will be applicable where ever there is a need of any kind of Hand written or Cursive character Detection It will be very much useful and convenient for Historical Research and Files It will be used In Doctors Prescriptions or Scientists hand written Thesis, Journals and Research Papers V CONCLUSION The problem of the cursive handwriting is made complex by the fact that the writing is inherently ambiguous letters in a word are generally linked together, poorly written, and may even be missing As a consequence, cursive script recognition requires sophisticated techniques, which uses a large amount of shape information and which compensates for the ambiguity by the use of contextual information The recognition strategies heavily depend on the nature of the data to be recognized Small modifications and sensitive selection of the parameters in the laboratory environment provide high recognition rates for a given data set However, a successful system reported in the literature may fail in real-life problems due to unpredicted change in data and violation of the initial assumptions in real life environment In this study, we try to maximize the amount of information to be retained in the data to a certain extent The proposed method avoids most of the pre processing operations, which causes loss of important information However, the information, extracted in the global parameter estimation stage, is used on top of the original image as needed We used the slant and skew angle information, but we did not make slant or skew angle correction We used the binary information, but we did not rely only on the binary information One of the major contributions of this study is the development of a powerful segmentation algorithm Utilization of the character boundaries, local maxima and minima, slant angle, upper and lower baselines, stroke height and width improves the search algorithm of the optimal segmentation path, applied on a gray-scale image REFERENCES [1] Survey of the State of the Art in Human Language Technology http://cslucseogiedu/hltsurvey/hltsurveyhtml, November 1995 [2] T M Cover and P E Hart Nearest neighbor pattern classification IEEE Transactions on Information Theory, 13(1):21 27, Jan 1967 [3] Text Recognition using Image Processing - International Journal of Advanced Research in Computer Science, Available:https://wwwgooglecoin/url?sa=t&source=web&rct=j&url=http://ijarcsinfo/indexphp/Ijarcs/article/downlo ad/3416/3419&ved=2ahukewi3gycfuqzzahuhvy8khtifamqqfjabegqiexab&usg=aovvaw2pmgwv3a0ax mtq8yecrabe 68

[4] Color Image Segmentation Using Acceptable Histogram Segmentation, Available : https://wwwgooglecoin/url?sa=t&source=web&rct=j&url=http://wwwmath-infounivparis5fr/~desolneux/papers/ddlp_ibpria_05pdf&ved=2ahukewim2fv5vkzzahul148khrsacowqfjadegqie xab&usg=aovvaw36mwrusbl6ggs83yb9uzn7 [5] Hand Written English Character Recognition using Column-wise Segmentation of Image Matrix (CSIM), Available: https://wwwgooglecoin/url?sa=t&source=web&rct=j&url=http://wwwwseasorg/multimedia/journals/computers/201 2/59-953pdf&ved=2ahUKEwjtpq2GvqzZAhUBpo8KHWvCBbUQFjAAegQIERAB&usg=AOvVaw1eykLdxtPjSYmPQH Dx2jij [6] Sandhya Arora, Debotosh Bhattacharjee, MitaNasipuri, Dipak kumar Basu and MahantapasKundu, Combining Multiple Feature Extraction Techniques for Handwritten Devnagri Character Recognition, Available: http://arxivorg/ftp/arxiv/papers/1005/10054032pdf [7] Handwritten Character Recognition, Available: http://tctsfpmsacbe/rdf/hcrinukhtm AUTHOR BIOGRAPHY Tanmay Bahadure pursuing BE in Information Technology, Rajiv Gandhi College of engineering and Research, RTMNU, Nagpur Pranay Wekhande pursuing BE in Information Technology, Rajiv Gandhi College of engineering and Research, RTMNU, Nagpur Manish Gaur pursuing BE in Information Technology, Rajiv Gandhi College of engineering and Research, RTMNU, Nagpur Shubham Raikwar pursuing BE in Information Technology, Rajiv Gandhi College of engineering and Research, RTMNU, Nagpur Yogendra Gupta pursuing BE in Information Technology, Rajiv Gandhi College of engineering and Research, RTMNU, Nagpur 69