Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of Electrical Engineering Gerhard-Mercator-University Duisburg, Germany e-mail: stm,rigoll,kosmala,dmazur @fb9-ti.uni-duisburg.de In this paper, we present a new feature extraction technique and a novel Hidden Markov Model (HMM) based classifier for the rotation, translation and scale invariant recognition of handdrawn pictograms. The feature extraction is performed by taking a fixed dimensional vector along the radius of a circle surrounding the pictogram. Within the HMM-framework these features are used to classify the pictogram and to estimate the rotation angle of the pattern using the segmentation power of the Markov Models. Three variations of the classifier design are presented, giving the option to choose between recognition with preferred rotation angles and fully rotation invariant recognition. The proposed techniques show high recognition rates up to 99.5% on two large pictogram databases consisting of 20 classes, where significant shape variations occur within each class due to differences in how each element is drawn. In order to obtain a detailed evaluation of our methods, experimental results for conventional approaches utilizing moments and neural nets are given in comparison. The techniques can be easily adapted to handle grey scale or colour images and we demonstrate this by showing some results of our experimental image retrieval by user sketch system which serves also as an example for future applications. 1 Introduction Invariant recognition of patterns is considered a highly complex and difficult task and has many applications such as e.g. optical character recognition (OCR), target identification and industrial part identification. Many suggestions to solve this task have been made (an overview is given in ) ranging from invariant feature extraction methods like moments and integral transforms to invariant classifiers such as special neural networks. This paper deals with the invariant recognition of hand-written pictograms or sketches, a task which involves additional challenges due to the well-known varieties in handwriting of even one single person. Two publications which addressed the task before share a subset of the classes in our database as well as the use of HMMs with us. He and Kundu described a method to recognize closed shapes with a combined HMM and autoregressive (AR) model. They used a database consisting of eight classes, which are part of our 20 class database, shown in Fig. 1. Due to their proposed features which consist of radii from the centre of gravity to the shape boundary, they are limited to classify shapes rather than unconstrained pictograms such as the classes 10, 11 or 13 in Fig. 1. He and Kundu use HMMs to classify the patterns. The desired rotation invariance is implemented as a preprocessing step prior to the feature extraction, by rotating all shapes to the same orientation. In a first step, the pattern is rotated by aligning the so called elongation axis to a given -axis. As stated in the paper, this method does not always yield the most similar radii sequence for the shapes of the same class. Thus, a second processing step based on a minimum radius point quantity especially for classes with symmetry properties (e.g. Class 8 1

; ; Class 1 Class 2 Class 3 Class Class 5 Class 6 Class 7 Class 8 Class 9 Class 10 Class 11 Class 12 Class 13 Class 1 Class 15 Class 16 Class 17 Class 18 Class 19 Class 20 Figure 1: Sample pictograms and sketches from our 20 class database. in Fig. 1) has been introduced. After the above steps, a third preprocessing step is needed, whenever the shape has a two-way ambiguity like Class 1 in Fig. 1. Hence, this is a rather complicated procedure. Lee and Lovell carried out experiments on the same eight class closed shape task. Like He and Kundu they use the preprocessing steps to rotate the patterns prior to the feature extraction. The features itself are again the radii from the centre of gravity to the shape boundary. The major difference of their approach is that the classification technique is based on a vector quantizer (VQ) instead of HMMs. In our approach, we propose a novel feature extraction method combined with an HMM classifier. Instead of using a complicated procedure based on rules and three different processing steps we make use of the HMM segmentation abilities to perform rotation invariant recognition. 2 Introduction to Hidden Markov Models (HMMs) Hidden Markov Models are finite stochastic automata which have been successfully applied to continuous speech and online handwriting recognition. Fig. 2 shows a continuous three state HMM with transition probabilities and output probability density functions (pdfs), where denotes the actual state at time, is a distinct state and denotes an observation vector. The pdf of state is usually given by a finite Gaussian mixture of the form "!$# %'& %+*, - %,/.0 % (1) )( where % is the mixture coefficient for the 1 th mixture and *, - %,.0 % is a multivariate ( Gaussian density with mean vector - % and covariance matrix.0 %. An HMM 23,., with N states is fully described by the N 5 N-dimensional transition matrix., the N-dimensional output pdf vector and the initial state distribution vector which consists of the probabilities 6 78 & 9 :. After the model 2 has been trained using the Baum-Welch-Algorithm, feature sequences <,==:=,?> can be scored according to > 2!$@ @BA @BA DC & @EFAG@E @E (2) 2

; ; ; a 11 a 22 a 33 a 12 a 23 s s s 1 2 3 b b b 1 2 3 Usually the likelihood 7 Figure 2: Continuous linear Hidden Markov Model 2 is estimated by the Viterbi algorithm, which is an,:==:=, >. During the recog- approximation based on the most likely state sequence nition step unknown patterns can be classified by the following Maximum-Likelihood (ML) decision: p arg p ' G2 p (3) This classifies a given observation sequence to class p. A very detailed explanation of the HMM-framework is given by Rabiner in. For our rotation invariant recognition, it is important to note that HMMs can be connected to form larger units from single models (e.g. in continuous speech, word models can be built from phones which are represented by single HMMs). The detailed description of our system is given in the next sections, starting with the feature extraction. 3 New Feature Extraction Method The new feature extraction is illustrated in Fig. 3. Our method is applicable to rather unconstrained pictograms or sketches and is not limited to closed contour shapes. The pattern is transformed into a vector sequence,=:==:,?> by dividing a surrounding circle into T rectangular stripes and each stripe into N blocks. For each block the percentage of black pixels is calculated. Thus, the sequence has a length of T observations and each observation consists of an N dimensional vector. The first vector is always the one corresponding to the stripe at the twelve o clock position, further feature vectors are taken clockwise. The centre of the circle is placed at the so called centre of gravity (COG) and the length of the stripes is equal to the distance between the COG and the maximum radius point. This kind of feature extraction has been proven to be translation and scale invariant by our experiments. Additionally, this technique can be adapted to the use of grey scale or colour images and can be applied together with the rotation invariant classifier to the task of rotation invariant image retrieval. Note that for the embedded training and for the recognition the feature sequence is presented twice, as described in the next section. Novel Rotation Invariant Modeling The basic idea of the novel rotation invariant HMM classifier is to present the feature sequence, generated as described in Sec. 3, twice and to use the combined seg- 3

mentation and classification abilities of the HMM recognizer to align the sequence (,=:==:,?>,?>,:===, > ) to models describing a part of the pictogram, then the pictogram itself and finally the remaining part of the pictogram. The two incomplete parts of the pictogram can be modeled either by a so called filler model or by a dublicated and modified pictogram HMM. This idea is illustrated in Fig., where a pictogram from Class 13 is given and the segmentation into samples belonging to the HMMs filler and Class 13 is shown. During decoding, the features extracted in T-1 T 1 2 3 HMM filler HMM Class 13 HMM filler T stripes per 360 N blocks per stripe surrounding circle [ 0.00, 0.08, 0.11, 0.00, 0.10 ] Figure 3: Proposed rotating feature extraction Figure : Alignment of the pictogram and filler HMMs to the twice presented feature sequence circular direction along the dashed line are expected to be emitted by the first filler model, whereas the dotted line indicates the features of the complete pictogram, while the features extracted along the dash-dotted line will be aligned against the final filler model. It can be seen in the figure, that once the alignment to the Markov Models has been found, the rotation angle is given by the number of frames which have been aligned to the first or second filler model. This combination of angle estimation and classification during the recognition process is much more elegant than the complicated preprocessing steps presented in. The structure of the Markov Models, starting with the original idea and followed by variations thereof, is presented in the next sections..1 Filler Modeling The first approach for modeling rotated pictograms is by surrounding the unrotated pictogram HMM by filler HMMs which have been initially trained on the first half of the feature sequence (,=:==, > ) of all classes. The Baum-Welch reestimation scheme can be used for the embedded training of the concatenated HMMs, shown in

Fig. 5. In the figure, there is an additional model, consisting of two HMMs which are trained on Class p, which is used for the unrotated pattern. Hence, if one of these two models produces the Maximum Likelihood output for a given feature sequence, the pictogram is classified to Class p. In our experiments we used quite small filler HMM filler HMM filler Figure 5: Two models representing a single pictogram class. The upper HMM is for modeling unrotated, the model below is for rotated pictograms. HMMs (3 5 states) which leads to a short decoding time. Due to the presentation of a finite training set, the filler models prefer the rotation angles seen during training. This is desired in some cases, for example to recognize slightly rotated versions of handwritten digits like 6 and 9. To build a totally invariant recognition scheme, we invented the modeling based on a modification of the initial state and the exit distribution..2 Modeling with Modified Initial State and Exit Distributions An alternative approach for rotation invariant Markov modeling is to combine the model for unrotated pictograms with two basically identical models but modified initial state and exit distributions. This is shown in Fig. 6. The components of the initial state distribution vector are all set to n where n denotes the dimension of as well as the number of HMM states. This guarantees the invariant recognition without preferring any rotation angles. The exit transitions are modified in a similar way, which means that every state in the third HMM can be the final one if the end of a feature sequence is reached. This is different from the HMMs introduced in Sec..1, where the Markov Model has to be in the final state, once the end of the feature sequence is reached. The proposed modeling technique can be used as a postprocessing step after the training of the HMMs according to Sec..1 has been performed. In that pre post Figure 6: Method 2 uses models with three identical HMMs but modified initial state and exit distributions. 5

case, the filler models are removed after training and the structure in Fig. 6 is build from. Alternatively, the structure can be trained directly, but in this case the initial state distribution vector would be changed and preferred rotation angles would occur. The decoding using this modeling is slower compared to the one which uses the filler HMMs because the higher number of states is computationally more expensive than the use of more Markov Models. Again the rotation angle can be derived from the alignment of the three HMMs (see also Fig. )..3 Cyclic Permutation of States The third proposed modeling technique uses a much higher number of models per class than the ones described above. For each class, the trained model is duplicated n times, where n denotes the number of states. Thereafter, the states of the models are permutated in such a way, that each state becomes the entry state once and also the last state in a different model (cyclic permutation). The other states are arranged as indicated in Fig. 7. Thus we get n models for each class and we only have to present the feature sequence once. Therefore, the rotation angle cannot be estimated n 1 2... n-1 1 n-1 n 1... n-2 2... 2 3... 1 n Figure 7: Cyclic permutation of HMM-states Figure 8: Five unrotated and five rotated pictograms, taken from Class 9 of the stm database. as shown previously, but is indicated by the configuration of the Markov Model with the Maximum Likelihood for the observation probability (i.e. the model at the top of Fig. 7 corresponds to, the model below to n where n denotes the number of states, etc.). Hence, the angle estimation of this approach is less accurate compared to those of the methods introduced above. This is due to the angle quantization of n which is larger than the quantization based on the frame alignment. We used this method only as a postprocessing step after the training procedure described in Sec..1 has been applied. This results in an invariant recognition without any preferred rotation angles, like the proposed method in Sec..2. It is also possible to train these models directly, but there is a more complex labeling required, compared to the other proposed classifier, because one has to label on the HMM level. Due to the large 6

number of models in this approach, the Viterbi-decoding is much slower compared to the other methods. To speed up this decoding process, the states in the models of the same class can be tied so that the time needed for calculating output probabilities is reduced drastically. 5 Experiments and Results To verify the proposed rotation invariant recognition approaches, we carried out a number of experiments on two large pictogram databases. After a short description of these databases, recognition results will be given and compared with recognition accuracies achieved using invariant features such as geometric and Zernike moments together with a neural net classifier. Finally we present our experimental image retrieval by user sketch system which serves as an example for future applications. 5.1 Database Our databases have been built by two different persons, denoted as stm and dib in the following. Both consist of 10 unrotated and 20 rotated hand-drawn pictograms for each of the 20 classes introduced in Fig. 1. These drawings were taken as two-level bitmap files from a digitizer board. The 20 rotated pictograms are split into a test and training set, consisting of 10 samples each. To demonstrate the large intraclass variance and the amount of contour perturbation, ten examples of Class 9 from the stm database are shown in Fig. 8. The results in the following section are given for both databases. 5.2 Results and Comparison with Moment Invariants In the experiments, we used 30 state HMMs and five state filler models, each state probability density function consisted of four mixtures. Features were extracted using 36 stripes and five blocks per stripe. The recognition results are given in Tab. 1, separated for the stm and the dib database. In the first row, the results for the modeling technique presented in Sec..1 are shown. The next two rows present the recognition rates for the HMMs with modified entry state distributions (Sec..2) separated in results for using this technique as a postprocessing step and for embedded training of these modified models. Thereafter, the recognition rates for method 3 are given in row. The two separated rows show results for the approaches presented in Sec..2 and Sec..3, respectively, this time for experiments done without an embedded training step. This means that no rotated pictogram patterns are presented during training. The models are simply trained using unrotated patterns and modified according to the proposed methods thereafter. Overall the experiments with the proposed techniques show very good results with high recognition rates, comparable to those presented in, but performed on databases with more than twice as much classes. When using the modified entry state distribution approach during embedded training, the recognition performance is weak compared to the other proposed methods. This is due to an overfitting problem resulting from a finite number of training patterns causing the 7

! @ Table 1: Recognition accuracy achieved in the experiments. method see also stm dib average filler modeling only Sec..1 99.5% 98.5% 99.0% filler modeling + modified Sec..2 99.5% 98.5% 99.0% modified only Sec..2 97.5% 92.5% 95.0% filler modeling + permutation Sec..3 99.5% 99.5% 99.5% modified without Sec..2 96.5% 95.0% 95.8% embedded training permutation without Sec..3 97.5% 98.5% 98.0% embedded training entry state distribution vector to have components other than n, which we already mentioned in Sec..2. The other approaches perform very well, with the permutation method performing slightly better. However, this is the computational most expensive method having the additional disadvantage of a less accurate estimation of the rotation angle. As mentioned in Sec..3, state tying can be used to reduce the computational costs, but this technique can also be combined with the other approaches (e.g. many states in HMMs representing the classes 2, 3, 5 or 6 can be tied due to symmetry). In order to obtain a detailed evaluation of our proposed methods, we also used conventional approaches utilizing invariant features and neural networks, to classify the pictograms of our databases. Translation, rotation and scale invariant features based on geometric moments have been introduced by Hu in. Discrete geometric moments of a pattern, of order are given by the equation @ 9!! 1, () However, these moments are not translation invariant and thus central moments were defined by @!!, (5) where, are the coordinates of the COG. So called normalized moments can be defined as - @ @ (6) which are also scale invariant. From a nonlinear combination of these functions up to order three Hu derived seven moment invariants which are also translation, scale and also rotation invariant. These Hu invariants are listed e.g. in. Later Li derived 52 moment invariants using normalized moments up to order nine. Other sets of moments have been used including Legendre, complex and Zernike moments. In it is shown that Zernike moments and geometric moments are related by! "! #$! % F '&)(& % @ &! * * % &,+ 8 %.-0/ 132 / / 1 25 * * 67 6 % % (7)

5 Table 2: Recognition accuracies achieved using moment invariants and a neural network classifier method stm dib average 9 moment invariants (Hu) 52.0% 6.5% 9.25% 52 moment invariants (Li) 99.0% 99.0% 99.0% Zernike moments 99.5% 96.5% 98.0% where are the complex Zernike moments, + and. 5 * * 6 is given by * * 6 6 G G G/ / /, We carried out experiments on our pictogram databases using the seven Hu moments, 52 Li moments and Zernike moments as invariant features together with a neural network classifier (MLP with a single hidden layer). The recognition accuracies are presented in Tab. 2. Overall, the results achieved using the moment invariants are slightly worse compared to those presented in Tab. 1 and additionally no angle estimation is given during the classification step. 5.3 Experimental Image Retrieval by User Sketch System The recognition rates presented in the rows five and six of Tab. 1 show that the use of the proposed methods for image retrieval tasks, with no rotated training data being available, is feasible. To demonstrate that the proposed methods can be adapted easily to natural images, we built an experimental image retrieval system, where queries can be formulated by user sketches. The feature extraction of the sketches and the binarized images is performed as described in Sec. 3. Hidden Markov Models with modified initial state and exit distributions as shown in Fig. 6 have been built for every element of the image database. After presenting the features of the sketch to the HMMs, the scores calculated according to Eq. 2 are used as a measure of the similarity between the database element and the sketch. Fig. 9 shows some results achieved with our system, where in every row the query sketch is shown first, followed by those three images with the highest similarity score. However, for improved results on a recognition or retrieval task on natural images, the features should include colour information. 6 Summary and Conclusions We presented a new feature extraction method and a novel HMM classifier applicable to the invariant recognition of rotated hand-drawn pictograms or sketches. Features are extracted by dividing a circle, which surrounds the pictogram, into rectangular stripes and by calculating for each stripe a vector of grey values. Three different approaches for the invariant classification using HMMs are presented. These three methods have in common that the rotation angle can be derived from the alignment 9 (8)

Figure 9: Query sketches and retrieved images of the features to the Markov Models. Furthermore, depending on the configuration of the HMMs, preferred angles or unconstrained rotation invariance can be achieved. Recognition rates up to 99.5% on two separate databases of hand-drawn pictograms for 20 classes are presented. We believe that our approach has several advantages compared to standard algorithms used for invariant recognition. Among these is the possibility of a simple feature extraction (as compared to the computation of most invariant features) and an efficient decoding leading to real-time capabilities even for large databases. The proposed recognition approach can easily be adapted to recognition or retrieval tasks on natural images. In this case, the use of colour fits naturally into the HMM framework, where the relation between observation vector components derived from different colours can be expressed by the covariance matrix of the Gaussian mixture. References 1. J. Wood, Invariant Pattern Recognition: A Review, Pattern Recognition, Vol. 29, No. 1, 1996, pp. 1 17. 2. Y. He, A. Kundu, 2-D Shape Classification Using Hidden Markov Model, IEEE Transactions on PAMI, Vol. 13, No. 11, 1991, pp. 1172 118. 3. S. Lee, B. Lovell, Modelling and Classification of Shapes in Two-Dimensions Using Vector Quantisation,Proc. IEEE-ICASSP, 199, pp. V11 V1.. L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. IEEE, Vol. 77, No. 2, Feb. 1989, pp. 257 286. 5. M. K. Hu, Visual pattern recognition by moment invariants, IEEE Transactions on Information Theory, Vol. 8, 1962, pp. 179 187. 6. Y. Li, Reforming the Theory of Invariant Moments for Pattern Recognition, Pattern Recognition, Vol. 25, No. 7, 1992, pp. 723 730. 7. C.-H. Teh, R. T. Chin, On Image Analysis by the Methods of Moments, IEEE Transactions on PAMI, Vol. 10, No., 1988, pp. 96 512. 10