LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Similar documents
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

OCR For Handwritten Marathi Script

Chapter Review of HCR

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

A Technique for Offline Handwritten Character Recognition

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

Handwritten Character Recognition A Review

Handwritten Script Recognition at Block Level

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Handwritten Devanagari Character Recognition Model Using Neural Network

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis

Handwritten Numeral Recognition of Kannada Script

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

Image Processing. Image Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Handwritten Hindi Numerals Recognition System

Devanagari Isolated Character Recognition by using Statistical features

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features

A two-stage approach for segmentation of handwritten Bangla word images

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

HCR Using K-Means Clustering Algorithm

Segmentation and Recognition of Gujarati Printed Numerals from Image

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

PCA-based Offline Handwritten Character Recognition System

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System

A Review on Handwritten Character Recognition

Optical Character Recognition

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Segmentation of Bangla Handwritten Text

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network

Handwritten Hindi Character Recognition System Using Edge detection & Neural Network

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

EE795: Computer Vision and Intelligent Systems

In this assignment, we investigated the use of neural networks for supervised classification

Image Processing: Final Exam November 10, :30 10:30

Paper ID: NITETE&TC05 THE HANDWRITTEN DEVNAGARI NUMERALS RECOGNITION USING SUPPORT VECTOR MACHINE

Segmentation of Characters of Devanagari Script Documents

LECTURE 6 TEXT PROCESSING

Machine Learning : Clustering, Self-Organizing Maps

Hand Written Character Recognition using VNP based Segmentation and Artificial Neural Network

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Handwritten Character Recognition System using Chain code and Correlation Coefficient

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

Handwritten character and word recognition using their geometrical features through neural networks

Image Normalization and Preprocessing for Gujarati Character Recognition

Digital Image Processing

HMM-based Indic Handwritten Word Recognition using Zone Segmentation

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

3D Object Recognition using Multiclass SVM-KNN

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

2. LITERATURE REVIEW

Offline Tamil Handwritten Character Recognition using Chain Code and Zone based Features

RULE BASED SIGNATURE VERIFICATION AND FORGERY DETECTION

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

2: Image Display and Digital Images. EE547 Computer Vision: Lecture Slides. 2: Digital Images. 1. Introduction: EE547 Computer Vision

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Region-based Segmentation

Anno accademico 2006/2007. Davide Migliore

Localization, Extraction and Recognition of Text in Telugu Document Images

Online Handwritten Devnagari Word Recognition using HMM based Technique

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Word-wise Hand-written Script Separation for Indian Postal automation

Devanagari Handwriting Recognition and Editing Using Neural Network

An Investigation on the Performance of Hybrid Features for Feed Forward Neural Network Based English Handwritten Character Recognition System

A Neural Network Based Bank Cheque Recognition system for Malaysian Cheques

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Chapter 2. Components

Fundamentals of Digital Image Processing

Algorithms for Recognition of Low Quality Iris Images. Li Peng Xie University of Ottawa

II. WORKING OF PROJECT

EE 584 MACHINE VISION

Improving License Plate Recognition Rate using Hybrid Algorithms

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Learning to Recognize Faces in Realistic Conditions

Multi-Layer Perceptron Network For Handwritting English Character Recoginition

CHAPTER 2 LITERATURE REVIEW

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Classification of Printed Chinese Characters by Using Neural Network

Digital Image Processing Chapter 11: Image Description and Representation

Character Recognition from Google Street View Images

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

COMPUTER AND ROBOT VISION

[ ] Review. Edges and Binary Images. Edge detection. Derivative of Gaussian filter. Image gradient. Tuesday, Sept 16

Time Stamp Detection and Recognition in Video Frames

Short Survey on Static Hand Gesture Recognition

Transcription:

LITERATURE REVIEW For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. The study of recognition for handwritten Devanagari compound character was conducted by Chavan S. et. al. (2013). Basically Geometric and Zernike moment features are used to recognize the handwritten character. It was tested on 27000 handwritten Devanagari basic and compound characters. Image is partition into zone and moment based features and is extracted from each zone. MLP and KNN methods are used for classification and recognition. MLP consist of three layers input, hidden and output for four different features set. MLP classifier is trained with standard back propagation. KNN classifier defines the similarity belonging to same classes. This work has achieved 98.78% accuracy using MLP and 95.56% accuracy using KNN methods. Khopkar M. ( 2012) has used MLP network to recognize Gujarati script which consist of three layers: one input, one hidden and one output layer. In this study symbol image detection is an integral part of input set preparation in both training and testing phase. In feature extraction process during first pass, left, right, top of all characters are detected and in second pass bottom (extreme) is discovered. Training network depend on the complexity of the patterns which are usually characterized by feature overlap and high data size. The proposed system identifies individual character with an accuracy of 80%.the major limitation of this study is it only recognizes single isolated Gujarati character. Pal A. and Singh D. (2010) proposed a Multilayer Perceptron (MLP) with one hidden layer to recognize handwritten English character. They used boundary detection feature extraction technique along with Fourier Descriptor to extract the information of the boundary of handwritten character. 500 samples are used to train back propagation network algorithm. The result shows that as the number of hidden nodes increases, the performance of back propagation network in recognition of handwritten English character is also high. The algorithm provides 94% of recognition accuracy with less training time.

Solanki P. and Bhatt M. (2013) have developed OCR for printed Guajarati character using Hopfield Neural network. Principal Component Analysis (PCA) is used to extract features of Gujarati character which reduce the dimensionality of a data set. The data set consist of large number of interrelated variables. Hopfield neural classifier is a special kind of recurrent neural network used for classification of characters based on features. They have used 748 images for training data set and obtained accuracy of 93.25%. Patil M. et. al. ( 2013) introduced a method for segmentation of text document and then recognized the handwritten Devnagari characters from the same. They divided process into two steps segmentation into line word and character, and then recognize the characters. For the recognition of segmented characters, feed forward neural network with one hidden layer is used. They achieved 100% success for the segmentation but due to similar shaped characters and connected characters, 60% result achieved for recognition of handwritten Devnagari text. Hybrid approach based on binary tree classifier and k-nearest neighbor for recognition of Gujarati handwritten characters is discussed by Patel C. and Desai A. (2013). They used structural and statistical feature for the classification and identification of characters. Classifications of characters based on primary features are done by studying the formation of each character. Once the character is classified into particular subset, secondary features like averaging, moment based and centroid distance based features are used for final identification. The author has collected 200 samples from different age groups and achieved recognition accuracy of 63.1%. Kumar V. and Rao R. (2013) presented multi layer perceptron (MLP) networks for recognition of handwritten Telugu characters. MLP classifier are first constructed and then integrated based on some features. With the help of back propagation algorithm, each MLP classifier is trained. For this they collected 195 samples, out of which 95 samples for training and other 100 samples used for testing of data. The result shows that by increasing number of hidden neurons does not improve the recognition accuracy of handwritten characters.

Singh S. et. al. ( 2011) suggested Support Vector Machine (SVM) classifier with Radial Basis Function (RBF) kernel for the classification. Th ey used zoning density (ZD) and background directional distribution (BDD) features for their experiment. By combining both types of features, total 144 features are used for classification. SVM classifier takes set of input data and classify them into one of the two distinct classes. The SVM performance is depends on the kernel used. SVM trained with whole data set achieved 95.04% 5 fold cross validation accuracy. J. P. et. al. ( 2011) presented a diagonal feature extraction method for handwritten alphabets recognition system. In this approach character image is divided into 54 equal zones which has 10*10 pixels size. From each zone pixels, features are extracted by moving along their diagonals. To train a feed forward back propagation neural network, these extracted features are used. For the classification, a feed forward back propagation neural network with two hidden layers is used. The accuracy of 97.8% for 54 features and 98.5% for 69 features shows that the recognition system provides good recognition accuracy with less time for training. Wagh T. and Badgujar C. (2013) proposed ANN to recognize handwritten Devanagari character and English numerals. They created character matrixes of each letter with the network structure. 900 inputs and 49 neurons in its output layer are required for neural network to identify the character. For the classification, Multilayer Perceptron (MLP) is used which is trained by Error Back Propagation (EBP) algorithm. The result shows that the success of method depends on the size of database. That means if the database is large, then probability of success is high but speed of recognition is slow. Kaur P. and Singh B. (2012) presented Support Vector Machine (SVM) classifier and features like zonal density, distance profiles, projection histogram and background directional distribution (BDD) in recognition of signboard images of Gurumukhi. The data set is collected from 15 different persons for 10 samples of each 10 numerals. In their work they recognized Gurumukhi characters and numerals both separately.

Choudhary A. et. al. (2013) proposed a new vertical segmentation algorithm for segmentation of off-line cursive handwritten words. This technique is developed to enhance over segmentation of word image by thinning the word image to get stroke width of a single pixel. The word image is scanned vertically, column wise and number of foreground pixels are count in each column. This Potential Segmentation Columns (PSC) is present in groups in word image where sum of foreground pixels are 0 or 1. This type of situation is termed as over-segmentation. They selected 200 handwritten word samples to evaluate the proposed approach. From their technique 83.5% of accuracy is obtained. Yang Y. et. al. (2011) developed OCR by combining features to trained Back Propagation (BP) network for recognition of English character. There are two kinds of features extractedstatistical feature and structural feature. From the character matrix, statistical features are extracted by doing lot of statistic. It is characterized by its strong anti-interference, the algorithm for matching and classification. A structural feature shows the character structure. They are combined with point, circle, line and other basic stroke. By structural feature, we can correctly identify character. BP network is made up of three layers-input layers, output layer and one hidden layer. The structural and statistical features are sent to BP network which solve the problem of External noise interference and accomplish the recognition. The result show that BP network with combine features achieved good recognition capability and convergence speed only within 184 epochs compared to other features classified by BP network. Pirlo G. and Impedovo D. (2011) proposed zoning based classification using Fuzzy Membership Function to recognize handwritten character. To define the problem of zoning based classification, they presented a real-coded genetic algorithm. It is used to find in single procedure, the optimal FMF together with design of the optimal zoning described by voronoi tessellation. The membership function defines the way in which feature influences the different zones. The result shows that fuzzy membership functions provide better classification performance compare to standard membership functions based on ranked level, measurement level weighting models and abstract level.

Patil S. et. al. (2010) discussed the characteristic of Support Vector Machine (SVM) and Artificial Neural Network (ANN) classification methods for handwritten Devanagari character recognition. Extracted features like chain code histogram, shadow feature, view based and longest run are used in SVM and Multilayer Perceptron (MLP) classifiers. Shadow feature is the length of the projection on sides. Chain code provides the points which are independent of the coordinate system in relative position to one another. View based features examine the different views of each character and from them describe the given character. The same MLP is designed for all four feature sets and trained with standard back propagation algorithm. The results for recognition of Devanagari character shows that SVMs provides reliable classification than ANN. John J. et. al. (2012) presents wavelet transform and Support Vector Machine (SVM) for recognition of unconstrained handwritten Malayam characters. Wavelet transform is used for feature extraction and only analysis filter are used for decomposition. This decomposition and image of a character in to set of different resolution sub-images which is helpful for extracting relevant features. SVM is used for linear and non linear classification of character. It is mapped into high dimensional space through radial basis function kernel, so that feature space is linearly separable. The SVM is discriminative classifiers which has good generalization and convergence property. By this method they achieved result of 90.25%. To recognize Bengali handwritten characters, Rahman A. and Saddik A. (2007) proposed curvefitting algorithm. Modified Syntactic Method (MSM) consist of major four components - reference database character component, stroke generation component, curvature analysis and string generation component, string matching and character recognition. This method does not require any image segmentation. It helps in recognizing various strokes of different patterns with high accuracy and also provides structural description of characters. The problem may occur if some of the strokes in a character are of same magnitude as noise character. This proposed algorithm gives 95% of accuracy. Choudhary A. et. al. (2013) developed OCR for recognition of handwritten character of English language by extracting features from binarization technique. In this method pixel values are separated into two groups- black as foreground and white as background. The main idea behind

this is to minimize the unwanted information present in the image and eliminate the background noise associated with the image. Global gray scale intensity thresholding is used in this technique. To classify the offline cursive handwritten character, multilayer feed forward neural network classifier is used. The feature extracted from binarization along with back propagation algorithm gives accuracy of 85.62%. Maximum 100000 epochs are allowed for the training process in this study. The training will stop if network would not converge within this maximum allowed epochs. Another feature extraction based on contourlet transform for handwritten Malayam character recognition is proposed by George A. and Gafoor F. (2014). In this method, the original image is divided to a low pass image and a band pass image using Laplacian Pyramid (LP) decomposing. Each band pass image is further decomposed by Directional Filter Banks (DFB). They have used 16 statistical features like ratios of grid values in horizontal and vertical direction and 4 levels contourlet decomposition. A feed forward back propagation with three hidden layers is used for the classification. This system provides 97.3% of an average accuracy. Mukarambi G. et. al. ( 2012) proposed OCR to recognize mixture of Kannada and English character with single algorithm. To extract the features, character image is divided into zones and then these features are fed to the SVM classifier. This algorithm is independent of slant of the characters and thinning. But the problem with this method is it does not provide high recognition accuracy for large database with fewer amounts of time and with minimum number of features. The SVM classifies with 2 fold cross validation technique provide 73.33% and 96.13% of average recognition accuracy for Kannada consonants and English lowercase alphabets correspondingly. Desai A. (2010) has done his work on Gujarati numerals recognition where he has collected 0-9 digits from 300 different people. In the scanned image contrast, adjustment is done by adaptive histogram equalization algorithm, smoothing of image boundaries are done using median filter and nearest neighborhood interpolation algorithm is used to put all handwritten digit in a uniform size. To deal with skew correction, digit is rotated up to 100 fine patterns for each digit in clock wise and anti-clock wise direction with difference of 20 each. He has suggested four different

profiles, horizontal, vertical and two diagonals for the feature extraction. The vector of these four profiles is used for identification of a digit. He used feed forward back propagation neural network for the classification and proposed multilayered neural network with three layers (94, 50, 10) neurons respectively and has achieved 81.66% of accuracy in his work. In the preprocessing phase, Rahman S. et. al. (2008) used canny method for edge detection and normalized numerals using thinning and dilation algorithm. He extracts four directional local feature vector by kirsch mask and one global feature vector. Kirsch mask used to get the edges through the horizontal, vertical, right and left diagonal. He used PCA and SVM to enhance the accuracy. PCA decrease the dimension and extract more significant feature. The output of PCA is then passed to a SVM to determine appropriate class. They achieved 92.5% of accuracy. Singh R. and Kaur M. (2010) suggested adaptive sampling algorithm, Otsu s threshold algorithm and hilditch algorithm and its variants for normalization, image binarization and thinning of binarized image. They have represented each character as a feature vector in the feature extraction stage. The various features for the classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), number of slop lines and special dots. For the classification of the Telugu characters, they used Back Propagation algorithm. It is based on supervised learning. It consists of three layers: input, hidden and output. There are two phase in that: forward phase and backward phase. Aggarwal A. et. al. (2012) used threshold value for converting image into binary image. Median filtering is used to remove the noise and after segmentation each character is normalized to size 90*90. For the feature extraction, they proposed Gradient feature. This feature measures the gradient magnitude and gradient direction of greatest change in intensity in a small neighborhood of each pixel. Sobel templates are used to compute the gradients. They used SVM with RBF kernel as a classifier. Basically SVM is two classes classifier. Margin width between the classes is the optimization criterion that is the empty area around the decision boundary defined by the distance to the nearest training pattern. This pattern called support vector which define classification function. They achieved 94% of recognition accuracy.

Niranjan S. et al. (2009) used Fisher Linear Discriminate analysis (FLD), 2DFLD, and diagonal FLD based methods for feature extraction to recognize unconstrained Kannada handwritten characters. They have calculated between class scatter matrix, within class scatter matrix, solved generalized eigenvectors and eigenvalues, sort eigenvectors by their associate eigenvalues from high to low and from each sample of training set extracted feature. For the classification purpose, they used different distance measure techniques such as, Minkowski, Manhattan, Euclidean, Squared Euclidean, Mean Square Error, Angle, Correlation co-efficient, Mahalonobis between normed vector, Weighted Manhattan, Weighted SSE, Weighted angle, Canberra, Modified Manhattan, Modified SSE, Weighted Modified SSE, Weighted Modified Manhattan are used and defines that combination of 2D-FLD with Angle and Correlation performs better recognition of vowels and consonants for Kannada handwritten characters compared to other methods and distance metric. Patil N. et. al. (2011) suggested Moment Invariants (MIs), Affine moments Invariants (AMIs), image thinning, structuring the image in box format for the feature extraction. The MIs are derived by means of the theory of algebraic invariants whereas AMIs are invariants under general affine transformation. They used Fuzzy Gaussian Membership function for the classification of Marathi handwritten characters. The template is formed and it consists of mean and standard deviation for each feature. In preprocessing stage, Agnihotri V. ( 2012) used threshold value and sobel technique for binarization and edge detection. After binarization and edge detection, dilation on the image and filling of the holes were presented. In segmentation, he segmented the preprocessed image into isolated character using labeling process. The label provides information about number of characters in image. He used diagonal feature extraction for extracting the features. Individual character is resized to 90*60 pixels and divided into 54 equal zones and size is 10*10 pixels. The features are extracted from each zone by moving along their diagonals. This process is repeated for zones to extraction of 54 features for each character. He used Feed Forward Back Propagation neural network for the classification. The neural network consists of 54/69 inputs layers, two hidden layer with 100 neurons and output layer with 44 neurons. He achieved 97% of recognition accuracy for 54 features and 98% for 69 features.

Shrivastava S. and Gharde S. (2010) have done their work for recognition handwritten Devanagari Numeral. They have collected 2000 samples from the different age group. Moment Invariants (MIs) and Affine Moment Invariants (AMIs) are used as feature extract ion. MIs are used to evaluate seven distributed parameters of numeral image and AMIs were derived by mean of the theory of algebraic invariants. These methods extract 18 features from the image. Here SVM is used for the classification phase. SVM is supervised machine learning technique. SVM with Radial Basis Function (RBF) gives 99.48% recognition accuracy. Shanthi N. and Duraiswamy K. (2007) compared different image sizes for Tamil character recognition. Various preprocessing operation like thresholding, skeletonization, line segmentation, character segmentation, normalization done using Otsu s histogram-based global thresholding, Hilditch s algorithm, Horizontal histogram profile, vertical histogram profile and bilinear interpolation technique respectively. For the feature extraction, pixel densities are calculated for different zones of the image and values are used as character feature. These features are train and test using the SVM. The results are tested for 32*32, 48*48 and 64*64 size image. They achieved 87.4% of recognition rate for unconstrained Tamil character. Patil S. et. al. (2012) used Fourier Descriptor and HMM for the recognition of handwritten Devanagari characters. They have collected 500 samples from the 50 different age group people. Before feature extraction image is normalized using moment normalization method. To resize the image Fourier Descriptor is used and then translates it to the center of the image frame. These extracted features are applied to HMM (hidden Markov model) for the training. The HMM is a finite set of states, which is associated with a probability distribution. The result can be depends on the trained dataset for more number of states per model. Dixit S. and Suresh H. (2013) have proposed line segmentation technique with sliding window and skewing operation for handwritten Tamil characters. Binarization is done using Adaptive Histogram Equalization technique. To remove the problem of inappropriate segmentation due to skewed lines and line overlapping, they performed Sliding Window Based line segmentation and skewing operation. This segmentation method gives average 87% recognition accuracy.

Vikram C. et. al. (2013) has done their work for handwritten Telugu character recognition using Multilayer Perceptron (MLP). Features are extracted from the different zones and feature vector is computed from the training set. Classification using MLP model gives 85% of recognition rate. Sureshkumar C. and Ravichandran T. (2010) used Neural Network with RCS for the recognition of handwritten Tamil characters. In their work, after scanning the image, paragraph segmentation is done using Vertical Histogram, line segmentation using Vertical Histogram, word segmentation and character image glyphs using Horizontal Histogram method. They have extracted the Invariant Fourier Descriptor feature which is independent of position, size and orientation. Classification using RCS and back propagation, they achieved 97% recognition accuracy. Choudhury A. and Mukherjee J. (2013) have proposed their work for handwritten Bangla numerals. They applied median filter technique for noise reduction and Otsu s method for binarization. Line segmentation is used to extract the feature and matches the characters with template images. Correlation Coefficient is used for the perfect matching which gives successful match between test data and training data.