Recognizing Characters in Natural Scenes: A Feature Study

Size: px

Start display at page:

Download "Recognizing Characters in Natural Scenes: A Feature Study"

Andrea Skinner
5 years ago
Views:

1 Recognizing Characters in Natural Scenes: A Feature Study Onur Tekdas tekda001@umn.edu Nikhil Karnad karna015@umn.edu Submitted in partial fulfillment of the course CSci 5521 Pattern Recognition Professor Paul Schrater University of Minnesota, Twin-Cities Fall, 2009 Department of Computer Science and Engineering, University of Minnesota, Twin-Cities

2 Contents 1 Introduction 3 2 Related work 6 3 Features Preprocessing Raw intensities Shape Contexts Wavelet features Principal Component Analysis (PCA) Multiple Kernel Learning 10 5 Results 12 6 Conclusion 14 1

3 List of Figures 1.1 Examples of characters in natural scenes that are difficult to recognize because of the amount of variation Flowchart for a typical character recognition that uses natural scene images as input Characters from the English characters dataset A log-polar histogram for a grayscale image of the digit Histogram for the 0 image shown previously Second level wavelet decomposition with approximation and diagonal coefficients for digit The best performance ( 85%) for WAV E SC112 features with SVM RBF was obtained at kernel width

Chapter 1 Introduction A number of factors introduce challenges to the task of recognizing

Clutter and placement: where exactly in a natural scene is the text and how much of the scene

solid, thick vs. thin lines, colors and textures, etc. 3. Variation in lighting conditions.

1: Examples of characters in natural scenes that are difficult to recognize because of the

Even if we remove the task of text segmentation, the remaining problems are still formidable.

In this project, we focus on the character recognition aspect, without getting into the details

4 Chapter 1 Introduction A number of factors introduce challenges to the task of recognizing characters in natural scenes. To list a few: 1. Clutter and placement: where exactly in a natural scene is the text and how much of the scene is not relevant to character recognition? 2. Different font styles: outlines vs. solid, thick vs. thin lines, colors and textures, etc. 3. Variation in lighting conditions. Figure 1.1: Examples of characters in natural scenes that are difficult to recognize because of the amount of variation. Figure 1.1 shows the difficulty in recognizing characters from such images. Even if we remove the task of text segmentation, the remaining problems are still formidable. Figure 1.2 shows the typical steps involved with a character recognition system. In this project, we focus on the character recognition aspect, without getting into the details of text segmentation, context, languages etc. We study the effect of different feature vectors on the classification performance for character recognition in natural scenes. Some of the features we tried were: raw grayscale pixel intensities, shape context descriptors, and 3

Text Segmentation Character Segmentation Feature Extraction Classification Figure 1.2: Flowchart for a typical character recognition that uses natural scene images as input. wavelet features.

5 Text Segmentation Character Segmentation Feature Extraction Classification Figure 1.2: Flowchart for a typical character recognition that uses natural scene images as input. wavelet features. Details are explained in Section 3. We use the recent SVM Multiple Kernel Learning (MKL) method [16] for classification because it allows us to optimize weights that linearly combine different kernels to find the best classifier. Our goal in this project was to explore various feature extraction schemes for character recognition, as well as to learn and implement a state-of-the-art classification scheme. Figure 1.3: Characters from the English characters dataset. The dataset was gathered from natural scenes in India for both English and Kannada languages, but we chose to use only the English language characters, as shown in Figure 1.3. Available at 4

6 The authors in [6] also acquired a database of hand-printed characters and another of characters generated by computer fonts, to serve as complementary training data. Upper and lower case characters are treated separately and including digits, there are a total of 62 classes. The dataset has 7705 natural character images. To develop intuition about the features and to work with a manageable data set size, we used only the first 593 data points, which correspond only to digits 0 to 9. 5

7 Chapter 2 Related work Digit and character recoginition is an active area of research in OCR applications, as well as in automatic pattern recognition from natural scenes. In this project, the second type is of relevance. The performance of character recognition largely depends on two main decisions: the feature extraction approach and the classification scheme. First, we study the feature extraction problem by trying out different features that have worked well for researchers in the past. In particular, we borrow features from object recognition literature that are not commonly used in character recognition. For instance, stroke direction is a good feature for numeral recognition [20, 8]. There are a lot of local statistical features that can be used as well. For a detailed survey, the interested reader is referred to [9]. For the classification problem, initial work started with statistical techniques [7] and neural networks [3]. Examples of statistical techniques are: Linear discriminants (LDF), Quadratic discriminants (QDF), Nearest-neighbors (k-nn), Parzen window, etc. Neuralnetworks for character recognition include: Multi-layer perceptron (MLP), Radial basis function network (RBF), polynomial classifier, etc. Details can be found in [3, 11, 18, 13]. In recent years, Support vector machine (SVM) classifiers have gained prominence [4]. It is based on statistical learning theory by Vapnik [14]. An SVM is basically a binary classifier and multiple SVMs can be combined to form a classification system for multi-class classification. The superior performance of SVMs has been justified in numerous experiments, particularly in high dimensionality and small sample size. In the first part of this project, we choose and implement two feature extraction schemes: shape contexts [2] and wavelets [19, 5, 12]. We study their performance using the Spider MATLAB Toolbox [17] for SVM classifiers. In the second part, we implement multiple kernel learning techniques [16, 15, 1] and report our results in comparison to the first part. 6

8 Chapter 3 Features 3.1 Preprocessing We converted all digit images to grayscale and resized them to 30x30 pixels. We used the MATLAB commands rgb2gray and imresize. 3.2 Raw intensities Without any further processing, we use the 593x900 values as a raw dataset. This will henceforth be referred to as RAW. 3.3 Shape Contexts In their paper, Belongie et al [2] describe a method to extract the relative positions of pixels in an edge image (Canny edges). First, we pick 3x3 fixed pixel locations in the image (8 along the border and one in the center). For each location, we impose a log-polar grid and bin the pixels in the edge image into a histogram (see Figures 3.1 and 3.2). We used discretizations: θ = 16 and r = 8. This generates a dataset of size 593x1152, denoted SC1152. We observed that using one fixed pixel location vs. 9 locations only gave a 1% loss in performance, so we opted instead to use 593x112 dataset that we denote as SC112. We implemented this feature extraction scheme in MATLAB. 3.4 Wavelet features Wavelet transforms have been used for texture representation, image compression and character recognition [19, 5, 12]. We resized the images to ( ) and applied a 5-level biorthogonal spline wavelet decomposition. We then retained only the level-2 approximation coefficients (top-left part of Figure 3.3). We denote this dataset as WAV E. We used the 7

9 Figure 3.1: A log-polar histogram for a grayscale image of the digit 0. MATLAB Wavelets Toolbox for this, more specifically the functions wavedec2, appcoef2, detcoef2, wcodemat, wkeep and wdencmp. 3.5 Principal Component Analysis (PCA) For multiple-kernel learning and multi-class svm toolboxes, we frequently encountered an OUT of MEMORY error, even on a computer with 4GB RAM. For these cases, we used PCA as a dimensionality reduction technique, projecting the data points along the top 15 eigenvectors. For the raw dataset, these dimensions capture 66.18% of the data. The command we used for this was MATLAB s princomp command. 8

10 log(r) θ 16 Figure 3.2: Histogram for the 0 image shown previously. Compressed Image Global Threshold = Figure 3.3: Second level wavelet decomposition with approximation and diagonal coefficients for digit 8. 9

11 Chapter 4 Multiple Kernel Learning In this section, we investigate the problem of learning the optimal kernels for SVM. In the previous section, we looked at the problem of choosing the right descriptors for the natural scene characters. There are two important concepts that determines the performance of a descriptor: discriminative power and invariance. The tradeoff between these two concepts changes according the specific problem in hand. In this section, we explored some state of the art kernel learning algorithms, to make this tradeoff automatically. Let N k be a base descriptor and f k be the associated distance function. The descriptor can be expressed in terms of kernels as K k. For example, we can simply set K k (x,x ) = exp(γ k f k (x,x )). Given the base kernels, optimal descriptor s kernel can be approximated as a linear combination of the base kernels, i.e. K opt = k d kk k where d k s correspond to the tradeoff. Then, Multi Kernel Learning (MKL) can be expressed as finding these weights yielding to the optimal kernel. The formulation of MKL is very similar to formulation of ǫsv M, we have learned in the class. 1 Min w,d,η 2 wt w + C1 t ησd t (4.1) subject to y i (w t φ(x i ) + b) 1 η i (4.2) η 0,d 0,Ad p (4.3) where φ t (x i )φ(x j ) = k d k phi t (x i )φ(x j ) (4.4) The only addition to standard ǫsv M formulation is the l 1 regularization of weights d. l 1 regularization causes the sparsity on d. It is desirable to set some of weights to 0 to remove poor descriptors and prevent overfitting [16]. Ad p encodes our prior knowledge about the problem. Using a similar trick in ǫsv M, we can formulate the dual problem as follows: Max α,δ 1 t α + p t δ subject to 1 2 αt Y K k Y α σ k δ t A k 0 δ, 0 α C, 1 t Y α = 0 10

12 General Multi Kernel Learning Benchmark A vs B 91% B vs C 94% B vs D 67% B vs P Error: No support vectors are found! B vs K Error: No support vectors are found! Table 4.1: Attempts that we have made with GMKL toolbox SimpleMKL Benchmark Gaussian kernels (σ 2 ) Polynomial kernels (degree) Result [ ] [1 2 3] 80% [ ] [1 2 3] 75% [ ] [1 2 3] 77% [ ] [1 2 3] 75% [ ] [1 2 3] 80% Table 4.2: Performance of SimpleMKL where non-zero α s correspond to the support vectors, Y is a diagonal matrix of labels and A k is the k th column of A. The dual is convex and it has second order constraints which can be solved as a Second Order Cone Program (SOCP). In [16], Varma et. al. shows how to convert the dual problem into a minmax optimization problem. They claim that the minmax optimization problem can be solved more efficiently and still gives the same result. Varma et. al. published their code [15] on their web site. We used their toolbox (Matlab) to find the optimal kernel for our feature set. This toolbox does not support multi class SVM. Hence, we trained one vs all binary classifiers and used a voting scheme to extend it to multi class problem. The results are listed in Table 4. As seen in Table 4.1, the results from GMKL toolbox is unpredictable. Sometimes accuracy is above 90%, sometimes it is below 70% and occasionally no support vectors are found. We verified by using other algorithms that the reason was not the dataset. Hence, we abandoned using this toolbox. We have found another toolbox called SimpleMKL [1]. SimpleMKL algorithm is presented in [10]. It also provides a multi class svm solution. We have tried a combination of Gaussian and polynomial kernels. The kernels and results are listed in Table 4.2. Since training takes quite awhile (15-30 mins), we were not able to find the optimal parameters in a systematic way. Rather we tried several parameters to manually decide on a good parameter set. Finally we concluded that the best parameters were the ones listed in the first line of the Table 4.2. We used these parameters to compare MKL and standard SVM in Chapter 5. 11

13 Chapter 5 Results We used the Spider machine learning toolbox in MATLAB to measure the 10-fold crossvalidation performance of an SVM RBF classifier on each of our features. The results are shown in the following table. Features # dims Classifier Correct rate RAW 900 SVM rbf= % ± SC SVM rbf= % ± W AV E 196 SVM rbf= % ± W AV E SC SVM rbf= % ± RAW 900 MKL 80% SC MKL % W AV E 196 MKL % W AV E SC MKL % From the results above, we see that using both wavelet features and shape context features together gives us a better performance than using either of them individually, while still getting closer to the performance on the raw data. Even together, both of these features use only about one-third of the dimensions compared to the raw data. This fraction becomes increasingly better for larger images, because we can select an appropriate level of the wavelet approximation coefficients to fit the number of dimensions we would like to have. For the WAV E SC112 feature space, we optimized the width of the RBF kernel in the SVM (see Figure 5.1). However, we found that the best performance is obtained for rbf= with a correct rate of % ± We expected that the Multiple Kernel Learning (MKL) technique would be able to combine these features in an optimal way to achieve better performance, but we frequently encountered an OUT of MEMORY error and therefore had to use PCA to lower the number of dimensions to 15. We also tried Fisher s Linear Discriminant Analysis (LDA) using 12

14 Correct rate RBF kernel width Figure 5.1: The best performance ( 85%) for WAV E SC112 features with SVM RBF was obtained at kernel width fisher command in Spider. The ranking of each component was same. Hence we concluded that we can not decrease the dimensionality using LDA. 13

15 Chapter 6 Conclusion In this project, we considered the digit recognition problem in natural scenes. We started by a literature search. We implemented two common descriptors: shape context and wavelet. We concluded that the recognition performance of shape context is poor while the performance of wawelet is slightly less than the performance of the raw data descriptors. Since the dimension of raw data descriptor is high this result was expected. We used Multiple Kernel Learning to find an optimal kernel for the svm algorithm. However, emprically we showed that these algorithms work poorly in high dimensional feature sets as opposed to the claims in the literature. 14

16 Bibliography [1] Simple multiple kernel learning code. arakotom/- code/mklindex.html. [2] S. Belongie and J. Malik. Matching with shape contexts. In IEEE Workshop on Contentbased Access of Image and Video Libraries, volume 12. Springer, [3] C. Bishop. Neural networks for pattern recognition. Oxford Univ Press, [4] C. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2): , [5] G. Chen and T. Bui. Invariant Fourier-wavelet descriptor for pattern recognition. Pattern recognition, 32(7): , [6] T. de Campos, B. Babu, and M. Varma. Character Recognition in Natural Images. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February [7] F. Keinosuke. Introduction to statistical pattern recognition. Academic Press, Boston, [8] F. Kimura, S. Nishikawa, T. Wakabayashi, Y. Miyake, and T. Tsutsumida. Evaluation and synthesis of feature vectors for handwritten numeral recognition. IEICE Transactions on Information and Systems, 79(5): , [9] C. Liu, K. Nakashima, H. Sako, and H. Fujisawa. Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognition, 36(10): , [10] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL [11] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by backpropagating errors. Cognitive modeling, page 213, [12] D. Shen and H. Ip. Discriminative wavelet shape descriptors for recognition of 2-D patterns. Pattern Recognition, 32(2): ,

17 [13] I. Tarassenko and S. Roberts. Supervised and unsupervised learning in radial basis function classifiers. IEE Proceedings-Vision, Image, and Signal Processing, 141:210, [14] V. Vapnik. The nature of statistical learning theory. Springer Verlag, [15] M. Varma and B. R. Babu. Generalized multiple kernel learning code. [16] M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. In Proc. ICCV, volume Citeseer, [17] J. Weston, A. Elisseeff, G. BakIr, and F. Sinz. The Spider machine learning toolbox [18] D. Wettschereck and T. Dietterich. Improving the performance of radial basis function networks by learning center locations. Advances in neural information processing systems, pages , [19] P. Wunsch and A. Laine. Wavelet descriptors for multiresolution recognition of handprinted characters. Pattern Recognition, 28(8): , [20] M. Yasuda and H. Fujisawa. An improvement of correlation method for character recognition. Trans, IEICE Japan J, 62:

Neural Networks and Deep Learning

Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,