arxiv: v2 [cs.cv] 17 Nov 2014

Similar documents
Local Image Features

arxiv: v3 [cs.cv] 3 Oct 2012

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Ensemble of Bayesian Filters for Loop Closure Detection

Evaluation and comparison of interest points/regions

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

String distance for automatic image classification

CS229: Action Recognition in Tennis

Multiple-Choice Questionnaire Group C

A Novel Extreme Point Selection Algorithm in SIFT

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Discriminative classifiers for image recognition

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Artistic ideation based on computer vision methods

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Click to edit title style

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

SEMANTIC SEGMENTATION AS IMAGE REPRESENTATION FOR SCENE RECOGNITION. Ahmed Bassiouny, Motaz El-Saban. Microsoft Advanced Technology Labs, Cairo, Egypt

The most cited papers in Computer Vision

TEXTURE CLASSIFICATION METHODS: A REVIEW

Classification of objects from Video Data (Group 30)

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

Part-based and local feature models for generic object recognition

Beyond Bags of features Spatial information & Shape models

Patch Descriptors. EE/CSE 576 Linda Shapiro

Sketchable Histograms of Oriented Gradients for Object Detection

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Tri-modal Human Body Segmentation

Selective Search for Object Recognition

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

Learning to Localize Objects with Structured Output Regression

Affine-invariant scene categorization

Human Detection and Action Recognition. in Video Sequences

Object detection using non-redundant local Binary Patterns

Video annotation based on adaptive annular spatial partition scheme

Exploring Bag of Words Architectures in the Facial Expression Domain

Beyond Bags of Features

Video Processing for Judicial Applications

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

MORPH-II: Feature Vector Documentation

Lecture 10 Detectors and descriptors

Specular 3D Object Tracking by View Generative Learning

Facial-component-based Bag of Words and PHOG Descriptor for Facial Expression Recognition

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Local Image Features

Bus Detection and recognition for visually impaired people

Learning Representations for Visual Object Class Recognition

Face and Nose Detection in Digital Images using Local Binary Patterns

Improved Spatial Pyramid Matching for Image Classification

Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

AK Computer Vision Feature Point Detectors and Descriptors

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Object Tracking using HOG and SVM

Patch Descriptors. CSE 455 Linda Shapiro

Action Recognition with HOG-OF Features

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Category-level localization

Fig. 1: Test images with feastures identified by a corner detector.

Supplementary Material for Ensemble Diffusion for Retrieval

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

A Robust Feature Descriptor: Signed LBP

Developing Open Source code for Pyramidal Histogram Feature Sets

Visual Object Recognition

2 Related Work. 2.1 Logo Localization

Face Recognition using SURF Features and SVM Classifier

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Object Recognition Algorithms for Computer Vision System: A Survey

Motion illusion, rotating snakes

Local Features and Bag of Words Models

Modeling Image Context using Object Centered Grid

Click to edit title style

Countermeasure for the Protection of Face Recognition Systems Against Mask Attacks

Aggregating Descriptors with Local Gaussian Metrics

A System of Image Matching and 3D Reconstruction

Evaluation of GIST descriptors for web scale image search

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

Using Geometric Blur for Point Correspondence

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Part based models for recognition. Kristen Grauman

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

CLASSIFICATION Experiments

Motion Estimation and Optical Flow Tracking

Transcription:

WINDOW-BASED DESCRIPTORS FOR ARABIC HANDWRITTEN ALPHABET RECOGNITION: A COMPARATIVE STUDY ON A NOVEL DATASET Marwan Torki, Mohamed E. Hussein, Ahmed Elsallamy, Mahmoud Fayyaz, Shehab Yaser Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University arxiv:1411.3519v2 [cs.cv] 17 Nov 2014 ABSTRACT This paper presents a comparative study for window-based descriptors on the application of Arabic handwritten alphabet recognition. We show a detailed experimental evaluation of different descriptors with several classifiers. The objective of the paper is to evaluate different window-based descriptors on the problem of Arabic letter recognition. Our experiments clearly show that they perform very well. Moreover, we introduce a novel spatial pyramid partitioning scheme that enhances the recognition accuracy for most descriptors. In addition, we introduce a novel dataset for Arabic handwritten isolated alphabet letters, which can serve as a benchmark for future research. Index Terms Arabic Handwritten, Alphabet Recognition, Arabic letters dataset, window-based descriptors 1. INTRODUCTION The interest in Arabic handwritten recognition has increased during the last few years [1]. The successful recognition of Arabic handwritten alphabet is an important milestone towards the goal of a successful Optical Handwriting Recognition (OHR) system. Window-based descriptors have shown a great success in the recent computer vision research on object recognition and detection [2]. Many of the window-based descriptors depend on the gradients in the window of interest, or depend on the texture. The intuition behind using window-based descriptors on the problem of character recognition is that the shapes of the letters can be represented using lines, circles, semicircular curves, and angles. The composition of character shapes from such primitive shapes suggests that gradient and texture features could be used to represent character shapes for the purpose character recognition. We show in this paper that existing window-based descriptors (which are based on Contact author s email: mtorki@alexu.edu.eg Mohamed E. Hussein is currently an Assistant Professor at Egypt-Japan University of Science and Technology, New Borg El-Arab City, Alexandria, Egypt; on leave from his position at Alexandria University, where most of the work associated with this project was performed. gradient and texture features) can be effectively used in the context of letter recognition for Arabic handwriting. The contribution of this paper is multifold. First, the paper introduces a compact 9K novel dataset of 28 classes that represent the isolated Arabic handwritten alphabet. The dataset s ground truth annotation includes writers genders. Second the paper presents a comparative evaluation of common window based descriptors in computer vision literature, namely, HOG, SIFT, SURF, LBP, and GIST. Third, the paper presents a comparative evaluation of four common classifiers on the chosen descriptors, namely, Logistic Regression, Linear SVM, Nonlinear SVM, and Artificial Neural Networks. The remainder of this paper is organized as follows: Section 2 gives an overview of gradient based window descriptors as well as texture based window descriptors. Afterwords, our novel dataset for isolated Arabic alphabet dataset will be introduced in Section 3. Experimental evaluation will be presented in Section 4. The conclusion is given in Section 5. 2. BACKGROUND The Arabic alphabet is widely used in many countries including all Arabic speaking countries. Other languages that use the same alphabet are Persian, Pashto, and Urdu 1. The wide use of these languages adds importance to the Arabic alphabet recognition problem. The difficulty of Arabic alphabet recognition is due to the fact that many characters have similar shapes, outlines, and dots. Figure 1.a shows four different classes and they almost have the same shape and outline; and the only differences are in the dots locations and numbers. The number of classes in the Arabic alphabet is 28. As of figure 1, it can be seen that the confusion between certain classes is very common such that almost half of the letters are having confusing class(es) with only the number and/or location of dots changed. Much research has been conducted on feature extraction methods like[]. None of these methods used the common window-based descriptors in the object/scene recognition literature, such as HOG [3], SIFT [4], LBP [5], GIST [6] and SURF [7]. 1 http : //en.wikipedia.org/wiki/arabic script

2.1. Window-Based Descriptors: An example of a successful window-based descriptor is the Histogram of Oriented Gradients (HOG) [3]. Scale Invariant Feature Transform (SIFT) [4], and Speeded-Up Robust Features (SURF) [7] are considered patch-based descriptors but since we are dealing with the whole image as a single patch we can consider them as window-based descriptors as well. The HOG descriptor bins the oriented gradients within overlapping cells. The SIFT descriptor consists of a histogram of oriented image gradients captured within grid cells within a local region. The SURF descriptor is an efficient alternative to SIFT that uses simple 2D box filters to approximate derivatives and it uses summary statistics instead of using histogram counts. Another set of descriptors describe the texture pattern in a window instead of gradient orientations. One of these descriptors is the GIST descriptor [6]. GIST applies a set of filters on a 4x4 cell grid. Within each cell, it records orientation histograms computed from Gabor filter responses. Another texture-based descriptor is the Local Binary Pattern descriptor [5]. 2.2. Spatial Pyramid: Spatial pyramid of descriptors showed improvement on recognition problems over the original descriptors [8, 9]. Inspired by the spatial pyramid, a temporal pyramid was introduced in activity recognition problems [10] where the temporal dependencies are described by the pyramid. Moreover, allowing for overlapping between temporal windows further improved the recognition accuracy [10]. Success of overlapping temporal pyramid in activity recognition and the structure of the Arabic alphabet lead us to introduce the spatially overlapping version of the window based descriptors. The effect of adding the overlapping spatial partitioning is illustrated in Figure 2, the different parts of the resulting descriptor will be much better discriminating the confusing classes. We propose to use three overlapping halves of the image of the character in each direction in addition to the original image. Thus every image is represented by a concatenation of 7 descriptors one for the original image, three overlapping halves in the vertical direction and three overlapping halves in the horizontal direction. We call the new descriptors HOG7, GIST7, LBP7, SIFT7 and SURF7. Experimental evaluation in sec. 4 shows that there can be a significant improvement in the recognition accuracy when using the descriptors with these spatial pyramids over using the original descriptors. (a)sample confusing classes (b) Arabic Alphabet Fig. 1. Confusing classes share layout and shape(top). Full Alphabet(Bottom) shows lots of confusing classes Fig. 2. Role of overlapped spatial partitioning on letter recognition in discriminating confusing classes 3. AIA9K DATASET AlexU Isolated Alphabet (AIA9K) dataset 2 was collected from 107 volunteer writers, between 18 to 25 years old, and who are B.Sc. or M. Sc. students in the Faculty of Engineering at Alexandria University. The 107 writers included 62 females and 45 males. Each writer wrote all the Arabic letters 3 times. The total number of collected characters is 8,988 letters. Figure 3 shows the form that has been used to collect the data. To obtain the images of the isolated characters from the scanned form image, some form pre-processing steps were performed. These steps included Harris corner detection and finding the geometric transformation to handle skewness in the scanned forms. Afterward a border removal and cell cropping steps were applied. Each Writer was asked to either leave his/her name (from which the gender was inferred) or write Male/female on the form. Forms without such information (only two) were excluded. A verification process for the ground truth was executed; and three types of errors were identified, as follows. (1)Cropping Error: Some letters are wrongly cropped, producing more connected components in the image, as in 4.a, or only a part of the letter, as in Figure 4.b (2)Writer Mistake: This occurs when a writer writes a letter in the wrong place, producing a wrong label. Missing points is a very common case, as in Figure 4.c. Some writing is not a letter at all, as in Figure 4.d. (3)Unclear Letters: This usually happens 2 Dataset is available for download by contacting first author mtorki@alexu.edu.eg

(a) Empty form (b) Filled form Fig. 3. Form used in the collection due to scanning errors, or writing errors as in Figure 4.e. If the unclear part covered most of the main features of the letter, the letter is removed. The verification process resulted in downsizing the data to 8737 valid samples. (a) (b) (c) (d) (e) Fig. 4. Three types of errors: Cropping(a,b), Writer mistakes(c,d) and Unclear letters(e). The training/ testing split was 70% for training, 15% for validation, and 15% for testing. The validation set is used to find the best parameters for each classifier. Later the training set is combined with the validation set to perform the final classification on the testing set. Table 1 shows the exact split, which takes writers genders into account. Table 1. Data Splits Female Male Total Total 5089 3648 8737 70% Train 3562 2553 6115 15% Validation 763 547 1310 15% Test 764 548 1312 4. EXPERIMENTS In this section we show our experiments on AIA9K dataset. Experiments are designed to recognize the letters with different descriptors and classifiers. 4.1. Classifier Tuning on Validation Set We used the validation set to adjust the parameters of different classifiers: the regularization parameter (λ) of the Logistic Regression (LR) classifier, the regularization parameter (λ) of Artificial Neural Network (ANN) (only one hidden layer of 25 neurons was used in all experiments), the C parameter for the Support Vector Machine (SVM) with linear kernel, and finally, the γ and C parameters for the SVM with RBF kernel. 4.2. Descriptors We used five different descriptors: three gradient-based descriptors (HOG, SIFT, and SURF), and two texture-based descriptors (GIST and LBP). To benefit from the spatial layout of the characters, we used a modified spatial partitioning that allows overlapping, presented in Section 2.2, and we appended the name of each descriptor by number 7 to differentiate it from the original descriptor.

4.3. Results: Character Recognition Table 2 shows the recognition rates using different classifiers and descriptors. It can be seen that SIFT is doing the best with and without spatial overlapping partitioning. As in Figure 5, it can be seen the the spatial overlapping partitioning, in general, improves the results. An interesting result is that the LBP feature is performing the worst and also it is interesting that spatial overlapping partitioning increased LBP s recognition rate the most. Figure 6 shows the 75 miss-classified letters by our best configuration, using the SIFT descriptor with SVM (RBF). Table 2. Accuracy using Train+Validation sets for training and Test set for testing: Character recognition LR ANN SVM (Linear) SVM (RBF) GIST 88.26% 91.01% 90.63% 92.61% HOG 86.51% 86.66% 87.25% 90.46% LBP 52.97% 57.09% 55.03% 57.32% SIFT 92.07% 92.99% 92.45% 94.28% SURF 66.39% 72.64% 70.05% 77.21% GIST7 90.70% 91.54% 91.31% 93.22 HOG7 87.27% 83.61% 89.71% 90.47% LBP7 79.73% 44.89% 79.65% 75.30% SIFT7 92.76% 93.29% 92.84% 94.13% SURF7 85.21% 85.90% 84.68% 87.58% Fig. 6. 75 miss-classified letters using SIFT descriptor and SVM(RBF) tors was obtained using overlapped spatial partitioning of the character image. Acknowledgment The authors would like to thank the Center of Excellence for Smart Critical Infrastructure (SmartCI) at Virginia Tech - Middle East and North Africa (VT-MENA) for sponsoring this work. The authors would also like to thank May Shoushan, Eiman El Gamel, Asmaa Shoala, Noha Ali, and Ayat Abdel-Fattah for their effort in data collection. 6. REFERENCES Fig. 5. Improvements using the overlapped spatial partitioning on letter recognition 5. CONCLUSION In this paper, we introduced a novel dataset (AIA9K) for isolated Arabic alphabet characters. We also performed a quantitative evaluation for window-based descriptors on the novel dataset. The chosen descriptors showed competitive recognition rates except for the LBP and SURF. SIFT produced the best accuracy (only 75 samples out of 1312 were missclassified). A remarkable improvement on different descrip- [1] Mohammad Tanvir Parvez and Sabri A. Mahmoud, Offline arabic handwritten text recognition: A survey, ACM Comput. Surv., vol. 45, no. 2, pp. 23:123:35, Mar. 2013. [2] Richard Szeliski, Computer vision: algorithms and applications, Springer, 2010. [3] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Conference on Computer Vision and Pattern Recognition. IEEE, 2005. [4] D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, pp. 91 110, 2004. [5] Pietikinen M Ojala T and Menp T, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24(7), pp. 971 987, 2002. [6] A. Torralba, Contextual priming for object detection, International Journal of Computer Vision, vol. 53(2), pp. 169 191, 2003. [7] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, Surf: Speeded-up robust features, Computer Vision and Image Understanding, vol. 110(3), pp. 346359, 2008. [8] Schmid C. Lazebnik, S. and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in Conference on Computer Vision and Pattern Recognition. IEEE, 2006.

[9] A. Bosch, A. Zisserman, and X. Munoz, Representing shape with a spatial pyramid kernel, in International Conference on Image and Video Retrieval. ACM, 2007. [10] Mohamed E Hussein, Marwan Torki, Mohammad A Gowayyed, and Motaz El-Saban, Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations, in Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 2013.