Continuous Chinese Handwriting Recognition with Language Model

Size: px

Start display at page:

Download "Continuous Chinese Handwriting Recognition with Language Model"

Gabriel Powell
6 years ago
Views:

1 Continuous Chinese Handwriting Recognition with Language Model Yanming Zou Kun Yu Kongqiao Wang Nokia Research Centre Beijing, BDA, P.R.China Abstract In this paper, we proposed a method to recognize the handwriting of several Chinese characters or even a full sentence simultaneously. With a common single Chinese character recognition engine and a language model, the whole recognition process is divided into two optimization processes. The first is to find the best grouping scheme to segment a plurality of written strokes into characters, and the following process is to find the best character sequence corresponding to those stroke groups. Some measures are utilized to speed up both optimization processes, in order that it is applicable on portable devices. Based on our test on over characters or 2200 sentences, the overall performance is quite promising, and the positive feedback from the testers have confirmed the validity of the proposed method. Keywords: HWR. 1. Introduction segmentation scheme, language model, Chinese characters are composed of multiple strokes in a compact manner, bearing some heritage from the pictograph. This makes it somewhat difficult to input those characters into digital devices from keyboard. Although there are some keyboard-based methods like Pinyin, many people have come to realize that writing is the most natural and efficient way for text entry. As intensive attention is paid on the handwriting recognition (HWR) technology, most of the research is focused on how to improve the recognition accuracy and input speed[1][2]. Referring to the pen and paper metaphor, people are used to writing a complete sentence without stopping to see whether the previous character is correctly recognized. Here new challenges appear: if the whole sentence is written continuously, characters are needed to be correctly segmented before recognizing each of them. Practically this has been a persistent problem, since the computer has to adapt and tackle the wide varieties of handwriting traces[3][4] before giving out plausible segmentation results. After segmentation, the individual characters should be recognized with single character HWR engine. To ensure the efficiency of text entry, it is desirable that the first candidate of each character is correct, because users would not like to check the recognition results for respective characters to complete the sentence. Many efforts have been taken to improve the first-hit rate of single character HWR engine, although no satisfactory result is achieved, especially for those on portable devices with many hardware restrictions. Naturally, the context could be used to group handwritten strokes into characters and then select the result from the candidate list. In most publications a lexicon is assumed to be available [5][6]. The lexicon does help to improve the recognition accuracy if only words are inputted, but for the case of casual input, especially the mixed text entry with punctuation, its effect is quite limited. The problem becomes even worse for Chinese, since there is no clear definition of words in Chinese, and no clear space between words when writing. Language model is another choice to utilize the context information[7]. Generally speaking, a language model can assign a probability to a sequence of characters and this probability can be used to adjust the output of common HWR engine. There have already been some publications on applications of language models for western language [8][9], but still not for Chinese. In this paper we propose a continuous handwriting segmentation and recognition method based on a bigram language model and a common single character HWR engine (Figure1). In the proposed scheme, the raw pen traces are first segmented according to the spatio-temporal information, and then the language model is used to find some optimized segmentation schemes. In our method, the language model is also utilized to finetune the recognition results. As there is no close coupling between the language model and the single character recognition engine, it is feasible to utilize different single character HWR engines in this solution. In our method the whole optimization process is divided into two stages: handwriting segmentation and recognition result adjustment. The aim for this division

2 C max = arg max C i C P (C i S) (1) where S = {s 1, s 2,... s i,... s n } is a collection of written strokes, s i is the ith stroke in the writing process, and C i represents one possible character string as a recognition result for the whole stroke sequence. If G = {G 1, G 2,... G i,... G p } and G i represents the ith scheme of grouping the written strokes, the optimization target becomes P (C i S) = max G j G P (C i G j ) (2) With formula (2), the optimization process can be written as max P (C i S) = max C i C max G j G P (C i G j ) (3) Since changing the order of the two maximum operators does not affect the maximum result, the formula (3) can be rewritten as max P (C i S) = max G j G max C i C P (C i G j ) (4) If P (G j ) is defined as formula (4) becomes P (G j ) = max C i C P (C i G j ) (5) Figure 1. Recognition process of the proposed method. is to speed up the whole searching process and make it applicable for portable devices. Actually, the computing complexity of our method as a common HWR engine is much higher than an engine for some special applications, e.g. mail address recognition[10]. It can be implemented in one step if enough computing resource is available. In the rest of the paper, the proposed approach will be introduced with emphasis on the language model for character segmentation and recognition. In section 2, the continuous HWR problem is separated into two optimization tasks. And details for solving them respectively are introduced in section 3. In section 4, experiments and the evaluations of the proposed method are illustrated. Section 5 concludes the paper. 2. Proposed Method As the purpose of handwriting recognition could be described as extracting the most possible text string from a sequence of written strokes, the optimization problem could be expressed as: G max = arg max G i G P (G i) (6) Since there is an efficient method to calculate the rough value of P (G j ), we propose a 2-stage method to solve whole optimization problem (1): first to find the best segmentation scheme defined in (6), and then to get the most possible text string with the best scheme G max. Assumed a segment g k is a collection of strokes, a segmentation scheme G can be defined as G = {g 0,... g k... g N 1 }. As one possible recognition result of the scheme, a text string C i can be described as C i = {c 0 i,... ck i... cn 1 i }, where c k i is just the recognized character in the string corresponded to the segment g k. With the definition above, the probability based on a bigram language model for the given segmentation G to be recognized as the text string C i becomes P (C i G) = Π N 1 k=0 P (ck i g k )P (c k i c k 1 i ) (7) where P (c k i ck 1 i ) is the transition probability from character c k 1 i to c k i, and P (ck i gk ) is the probability of stroke group g k to be recognized as character c k i. Thus P(G) can be written as

3 P (G) = max C i C ΠN 1 k=0 P (ck i g k )P (c k i c i k 1) (8) To speed up the searching process, we do not want to go through all the possible text strings. Instead, we calculate the reasonable maximum of each factor in (8) and use the product of those maximums as an approximated value of P (G). Thus P (G) can be calculated as where and here P (G) Π N 1 k=0 P (gk ) P (g k g k 1 ) (9) P (g k ) = max P (c k i g k ) i (10) P (g k g k 1 ) = P (c k n c k 1 n ) (11) n = arg max i:c i C P (ck i, c k 1 i g k, g k 1 ) (12) Here P (g j ) can be realized as the likelihood of a segment to be a character. And P (g j g j 1 ) is the rationality of forming a meaningful sentence with the segmentation scheme. We do not use the maximum value of P (c k i ck 1 i ) as an approximate value of P (g k g k 1 ), since the value of max i P (c k i ck 1 i ) may be meaningless when the corresponding P (c k i ) and P (ck 1 i ) is very small. With the approximate method (9), the computing complexity to evaluate one segmentation scheme is reduced dramatically. Assumed that the segmentation scheme G has N segments and each segment has M recognition candidates, to calculate the possibility of all the strings, we need o(m N ) multiply operation. With our approximate method, the complexity is decreased to the level of o(m M (N 1)). As a conclusion, our proposed method can be written as: G max = arg max G i G ΠP (gk i ) P (g k i g k 1 i )) (13) C max = arg max C i C ΠP (ck i g k max)p (c k i c k 1 i ) (14) 3. Algorithm Implementation Figure 2 illustrates the recognition flow of the proposed scheme. The key concept underlying this approach is the involvement of the language model built over the single character recognition engine. As the model could finetune the recognition process to reduce the requirement on memory usage and efficiency, this architecture is applicable on portable devices. Figure 2. The workflow of the proposed method Presegmentation As preparation of the segmentation task, presegmentation is adopted to get rough segments from the raw pen traces. In the original algorithm as defined in equation (9), the basic element for a grouping scheme is one single stroke. The number of possible schemes for n written strokes is 2 n 1, which will lead to high computational cost in the grouping process. To improve the efficiency, some obviously unreasonable schemes should be excluded from the searching space as early as possible. Actually there are some strokes which can be allocated to one character only based on their spatio-temporal relationship[11]. It is unreasonable to allocate them to different characters, so these strokes can be taken as a whole in the segmentation process to avoid further segmentation. Huge groups with too many stokes are also unreasonable, and thus if a threshold is set for the maximum stroke a character can have, the searching space for segmentation can be much smaller than 2 n 1. In fact, the maximum number of presegmented groups in one character can be much fewer than the number of strokes. Usually, a simplified Chinese character has no more than 40 strokes and a traditional character less than 65 strokes. However, with our method, most of traditional and simplified characters could be presegmented into at most 5 groups. So if the threshold is set for the maximum presegmented groups, the searching space is decreased drastically. It is ideal that one presegmented result is just one character. However, when written continuously, there is no clear space between adjacent characters. Moreover, some Chinese characters are composed of more than one component horizontally (Figure (3)), and thus the presegmentation result may contain two types of errors[6]: The group is only a part of one character; The group contains strokes from more than one character. To keep the correct scheme in the searching space, we should prevent the second type of errors while reducing the first type of errors as much as possible.

best text string. As a consequence, the best 10 segmentation schemes are reserved in the overall evaluation of the recognition result before the final text string is decided. 4.

One is the transition probability between the characters, which can be retrieved from language model directly. The other is the probability of a group of stroke to be one specific character.

4 best text string. As a consequence, the best 10 segmentation schemes are reserved in the overall evaluation of the recognition result before the final text string is decided. 4. Experiments and Evaluation Figure 3. Incorrect segmentations and combinations Segmentation and recognition The whole optimization process is dependent on two kinds of probabilities. One is the transition probability between the characters, which can be retrieved from language model directly. The other is the probability of a group of stroke to be one specific character. Generally speaking, common single character recognition engine can only give out some conditional probabilities which assumes the input stroke group is really corresponding to one character in the range of the engine. But this assumption is not always satisfied in our cases. To get the possibility our method requires, a coefficient should be multiplied to the probability given out by the single character engine. This coefficient can be estimated from geometric information such as the height and width of the segment, the height of the text line, and the space between the strokes, as shown in Figure (4). To test the performance of the proposed method, we have collected over 200 continuous handwriting samples from 170 people. The samples were collected on the touch-sensitive screen of a PDA. Each sample includes 6 10 pages, and for each page there are 3 15 Chinese characters including punctuation. Samples from 80 people were taken as reference to finetune the parameters of the algorithm; and the remaining samples from 90 people were used for test. In the writing process, the writers were advised to avoid intersection between adjacent characters to avoid ambiguities. Figure 5 gives some examples of the collected handwriting sample. Figure 5. Samples in the dataset. Figure 4. Geometric information from the strokes. In the optimization process, the standard dynamic programming method is utilized. Since the output of the language model and the likelihood evaluation from the HWR engine are both in the form of logarithm, all the other likelihood is transformed to logarithmic measurement as well, and thus all the multiplication calculations in the aforementioned formulas have been changed to addition calculations to reduce the complexity of computation. It is not necessary to search through the whole recognition range for each stroke group in equation (12), and evaluation of top ten candidates with highest possibility is enough. This further speeds up the optimization process. Since an approximated probability is used for searching the best segmentation scheme, it can not be guaranteed that the optimized scheme is just corresponding to the In the evaluation tests, the common single character engine is licensed from an outside company and its recognition range was set as the GB2312 character set. The language model is trained with the text from prevalent Chinese newspapers, and it covers the same character set. For the purpose of comparison, the HWR engine for single character and the engine for continuous recognition(chwr) were tested respectively. As shown in Table (1), for the proposed method, the overall recognition rate is 89.7% for the mixed input of Chinese characters and punctuation, which is much better than the single HWR engine alone (76.4%). Even though in the example of Figure 5(b) and Figure 5(d), where there is little spacing between the adjacent characters, the proposed method could recognize the characters correctly. A higher recognition rate is achieved for pure Chinese characters (92.1%) compared with the mixed input of Chinese and punctuation, because most punctuation are written with a single stroke, and it s easy to merge them into adjacent characters. We

5 have also got promising results for the punctuation between the characters (from 16.7% to 71.1%), because the proposed method has drastically improved the accuracy by utilizing the context and spatial information. Table 1. Comparison of performance. Recognition Rate(%) Single HWR CHWR Mixed Character Punctuation Besides the objective algorithm evaluation, a series of subjective tests were taken to test the level of user acceptance. Over 50 users were asked to write a collection of sentences, including characters and punctuation. During the writing process, they were kept aware of the recognition result. When the writing process was finished, the user was asked to evaluate the performance of the system with a score ranging from 1 to 7. Here 1 refers to the most unacceptable level, and 7 the most desirable level. The average evaluation score was 5.15, and referring to the experiential criterion, if this score is no less than 5, it is expected to be accepted by the major user group. Moreover, from the perspective of users, the speed of text entry has been notably improved, because they could write fluently as they do on paper. 5. Conclusion In this paper, an efficient scheme for continuous handwriting recognition is proposed to improve the accuracy of recognition, while giving users intuitive and efficient writing experience. As a practical method for online entry of Chinese sentences by casual users, the most significant features of this approach include: High recognition accuracy for the writing input of multiple characters; Reliable recognition speed to ensure the application on mobile devices; Moreover, as there is no close coupling between the language and single character HWR engine, most other single character HWR engines could be utilized as well. The advantages have not only proved the effectiveness of the proposed method, but also created new chances of text entry by means of natural writing. According to the comparison test, both the average recognition rate and the recognition speed are much better than the performance of single character recognition algorithms. generous help and advice. We also express our appreciations to the Centre for Intelligent Image and Document Information Processing in Tsinghua University for their technical discussions. References [1] Chenglin Liu, S. Jaeger and M. Nakagawa, Online recognition of Chinese characters: the state-of-the-art, IEEE Trans. PAMI, vol.26, no.2, 2004, pp [2] Hiromichi Fujisawa A view on the past and future of character and document recogntion, Proceedings of the ninth International Conference on Document Analysis and Recognition, Curitiba, Brasil, 2007, pp [3] S. Wesolkowski Cursive script recognition: a survey, Handwriting and Drawing Research: Basic and Applied Issues. IOS Press. Amsterdam, pp [4] Zhao S, Chi Z, Shi P, etc. Handwritten Chinese character segmentation using a two-stage approach, Proc. 6th Int. Conf. Document Analysis and Recognition, Seattle, USA, Sep. 2001, IEEE Computer Society Press, pp [5] G. Kim, V. Govindaraju A lexicon driven approach to handwritten word recognition for real-time applications, IEEE Trans. Pattern Anal. Mach. Intell. vol. 19, 1997, pp [6] Zhengbin Yao, Xiaoqing Ding, Changsong Liu On-line handwritten Chinese word recognition based on lexicon, Proc. 18th Int. Conf. Pattern Recognition, Hong Kong, China, Aug. 2006, pp [7] John F. Pitrelli, Amit Roy Creating word-level language models for large-vocabulary handwriting recognition, International Journal on Document Analysis and Recognition, vol.5, Numbers 2-3, Apr. 2003, pp [8] Freddy Perraud, Christian Viard-Gaudin, Emmanuel Morin and Pierre-Michel Lallican N-Gram and N-Class models for online handwriting recognition, Proceedings of the Seventh International Conference on Document Analysis and Recognition, Washington, USA, 2003, pp [9] F. Pitrelli, Jayashree, Subrahmonia and P. Perrone Confidence modeling for handwriting recognition: algorithms and applications, International Journal on Document Analysis and Recognition, vol.8, Issue 1, Mar. 2006, pp [10] Chenglin Liu, M. Koga and H. Fujisawa, Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading, IEEE Trans. PAMI, vol.24, no.11, 2002, pp [11] Naohiro Furukawa, Junko Tokuno and Hisashi Ikeda Online character segmentation method for unconstrained handwriting strings using off-stroke features, Proceedings of the International Workshop on Frontiers in Handwriting Recognition, Rennes, France, Acknowledgements The authors owe special thanks to the colleagues in Visual Systems team and Dr. Guohong Ding of Nokia Research Centre, who have facilitated the research with

A semi-incremental recognition method for on-line handwritten Japanese text

2013 12th International Conference on Document Analysis and Recognition A semi-incremental recognition method for on-line handwritten Japanese text Cuong Tuan Nguyen, Bilan Zhu and Masaki Nakagawa Department