Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect. Pattern Recognition

Size: px

Start display at page:

Download "Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect. Pattern Recognition"

Joleen Adams
5 years ago
Views:

Pattern Recognition 45 (2012) 3661 3675 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.

of Automation, Chinese Academy of Sciences, 95 Zhongguan East Road, Beijing 100190, PR China b Intelligence Engineering Lab & Laboratory of Computer Science, Institute of Software, Chinese Academy of

1 Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: An approach for real-time recognition of online Chinese handwritten sentences Da-Han Wang a, Cheng-Lin Liu a,n, Xiang-Dong Zhou b a National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguan East Road, Beijing , PR China b Intelligence Engineering Lab & Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, P.O. Box 8718, Beijing , PR China article info Article history: Received 10 January 2012 Received in revised form 24 March 2012 Accepted 18 April 2012 Available online 30 April 2012 Keywords: Online Chinese handwritten sentence recognition Real-time recognition Dynamic text line segmentation Dynamic over-segmentation Dynamic candidate lattice Path search abstract With the advances of handwriting capturing devices and computing power of mobile computers, penbased Chinese text input is moving from character-based input to sentence-based input. This paper proposes a real-time recognition approach for sentence-based input of Chinese handwriting. The main feature of the approach is a dynamically maintained segmentation recognition candidate lattice that integrates multiple contexts including character classification, linguistic context and geometric context. Whenever a new stroke is produced, dynamic text line segmentation and character over-segmentation are performed to locate the position of the stroke in text lines and update the primitive segment sequence of the page. Candidate characters are then generated and recognized to assign candidate classes, and linguistic context and geometric context involving the newly generated candidate characters are computed. The candidate lattice is updated while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition. Since the computation of multiple contexts consumes the majority of computing and is performed during writing process, the recognition result is obtained immediately after the writing of a sentence is finished. Experiments on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the robustness and effectiveness of the proposed approach. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction With the proliferation of pen-based and touch-based mobile computers, online handwriting recognition has many potential applications [1 4], including text input, handwritten notes and diagrams recording, signature verification, and mathematical expressions recognition [5]. Character recognition-based Chinese text input has been widely applied in Chinese market. However, as the handwriting capturing devices and computing power of mobile computers advances, sentence-based text input becomes possible. Compared to character-based input, sentence-based input is more natural and enables faster and more accurate input via handwritten sentence recognition incorporating contexts. Handwritten sentence (character string) recognition is a difficult contextual classification problem involving character segmentation and recognition [2,3]. There have been many efforts towards the improvement of handwritten character string recognition [6 11]. Most methods adopt the integrated segmentation recognition strategy to overcome the ambiguity of character segmentation. In the segmentation recognition framework, handwritten text is first n Corresponding author. Tel.: þ addresses: dhwang1983@yahoo.com.cn, dhwang@nlpr.ia.ac.cn (D.-H. Wang), liucl@nlpr.ia.ac.cn (C.-L. Liu), xiangdongzhou@foxmail.com (X.-D. Zhou). over-segmented into primitive segments which can be a character or a part of a character. Then candidate character patterns are generated by concatenating consecutive segments, and are recognized by a character classifier to assign candidate classes. The candidate character sequence and assigned candidate classes are represented in a segmentation recognition candidate lattice, which contain many segmentation recognition paths each corresponding to one recognition result. The optimal path of segmentation recognition is searched from the candidate lattice via path evaluation combining character classification scores and contexts. Fig. 1 shows a typical handwritten text recognition system (Fig. 1(a)), and an illustrative example of over-segmentation and the segmentation recognition candidate lattice (Fig. 1(b)). The above methods, though show promise, perform character segmentation and recognition after the sentence writing is finished. To achieve real-time recognition, character segmentation and recognition should be performed during the writing process, such that the result can be obtained immediately after the completion of writing. In recent years, some real-time handwriting input (dynamic recognition during writing) products have been developed, but we have not seen an academic study addressing this problem theoretically or experimentally. Besides character string recognition, real-time recognition of handwritten sentences also involves text line segmentation, since sentences are often written in multiple lines due to the limited /$ - see front matter & 2012 Elsevier Ltd. All rights reserved.

2 3662 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 1. (a) A typical handwritten text recognition system. (b) An illustrative example of over-segmentation and the segmentation recognition candidate lattice. Each box contains the candidate character (upper) and its candidate classes (lower). The optimal path is denoted by thick line with red characters (left one in each box) being the correct result. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) space of writing area. Text line segmentation in real-time recognition is difficult because the lines are short, the strokes are dynamically produced, and there are often delayed strokes, which are inserted into previous characters or even previous lines. Unlike previous text line segmentation methods that mostly group strokes into lines after all strokes are produced, the dynamic segmentation during writing can only utilize the information of part of strokes. In this paper, we propose an approach to real-time recognition of Chinese handwritten sentences using a dynamically maintained segmentation recognition candidate lattice. Whenever a new stroke is produced, dynamic line segmentation and character over-segmentation are performed on the stroke to update the primitive segment sequence and locate the position of the segment in text lines. Then candidate characters are generated on the new stroke, and are recognized to assign candidate classes. Meanwhile, multiple contexts including linguistic context and geometric context involving the newly generated candidate characters are computed using language model and geometric models. The candidate lattice is updated constantly while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition by the path search algorithm as in conventional character string recognition. Since the updating of the candidate lattice consumes the majority of computing and is performed during writing process, sentence recognition is obtained immediately after a long pen lift. Based on automatic recognition, we can develop some editing functions to manually correct segmentation and recognition errors to facilitate user applications. For dynamic text line segmentation in real-time recognition, we propose to adopt a statistical classifier to model the geometric relationship between the ongoing stroke and the existing text lines. By classification based on extracted features of a line stroke pair, the classifier judges whether to assign the stroke to a previous line or it starts a new line. The method can deal with delayed strokes by grouping them into previous lines, and therefore, it makes the real-time recognition system more robust. For dynamic character over-segmentation, we also use a statistical classifier to model the geometric relationship between the ongoing stroke and existing primitive segments that belong to the same line of the stroke. We transform the output of the classifier on extracted features of a segment stroke pair into posterior probability by confidence transformation [12], which indicates the probability of the stroke belonging to the segment. The stroke is considered to belong to the segment if the probability is greater than a threshold. By testing each segment stroke pair, the stroke is assigned to one existing segment or starts a new segment. The position of the segment in the sequence of segments is located according to their left boundaries. Similar to dynamic text line segmentation, the over-segmentation can also deal with delayed strokes. For path search after candidate lattice construction, we propose a real-time beam search algorithm for real-time recognition. The beam search algorithm is an accelerated version of the dynamic programming (DP) algorithm by pruning the partial paths at intermediate nodes. Via retaining partial optimal paths ending at each segment, we perform search from the updated segment other than from the start segment. We evaluated the performance of the proposed approach in respect of the recognition accuracy and recognition speed on a large database CASIA-OLHWDB [13] of unconstrained online Chinese handwritten characters and texts, and the results demonstrate the robustness and effectiveness of the proposed approach. The rest of this paper is organized as follows. Section 2 reviews the related works. Section 3 describes the baseline character string recognition method that we customize for real-time recognition. An overview of the real-time recognition system is provided in Section 4. Section 5 presents the methods for dynamic text line segmentation, dynamic character over-segmentation, and candidate lattice updating. The real-time path search algorithm is described in Section 6. Section 7 presents the experimental results, and Section 8 offers concluding remarks. This paper is an extension to our previous conference paper [14] by elaborating the procedures of dynamic line segmentation, dynamic character over-segmentation, and candidate character generation, incorporating geometric context into the path evaluation criterion, optimizing the combining weights, and evaluating the system quantitatively on a large database of online handwriting. 2. Related works Chinese handwritten character string recognition is a challenging problem due to the large character set, the diversity of writing styles, the character segmentation difficulty, and the unconstrained language domain. Particularly, due to the variability of character size and position, character touching and overlapping, the characters cannot be reliably segmented prior to character recognition. To overcome the large number of character classes and the infinite

3 D.-H. Wang et al. / Pattern Recognition 45 (2012) sentence classes of Chinese texts, over-segmentation-based character string recognition approaches are commonly used [1]. Under the integrated segmentation recognition framework, a lot of efforts have been devoted to the key techniques in Chinese/ Japanese handwritten character string recognition. In the framework, the criterion for evaluating candidate segmentation recognition paths usually integrates multiple contexts including the character classification, linguistic context and geometric context. Among previous works, some integrated incomplete contexts [15 17], and some combined the contexts heuristically without optimizing the combining weights [8,9,18,19]. Zhou et al. optimize the combining weights using the conditional random field (CRF) model [10], which is hard to incorporate language models of higher order than the bi-gram, while Zhu et al. adopt the genetic algorithm (GA) [11] to optimize the combining weights, which is computationally expensive and is sensitive to some artificial parameters. Recently, Wang et al. proposed to integrate the character classification scores and linguistic context by transforming the output of character classifier into posterior probability via confidence transformation [20], which benefits the recognition performance. Furthermore, they investigated into the parameter optimization for path evaluation and efficient path search, and achieved significant improvements on unconstrained handwritten Chinese texts [21]. They reported character-level correct rate of 91.39% on an offline Chinese handwriting database CASIA-HWDB [13]. On another offline Chinese handwriting dataset HIT-MW, they achieved character-level correct rate of 92.72%, which is much higher than previously reported results in [15,22]. For online character string recognition, many works experimented on Japanese handwritten text databases have reported higher accuracies [8 11], which results from the fact that online handwriting recognition has the advantage that the sequences of strokes are available for better segmenting and discriminating characters. For online Chinese character string recognition, however, there have few works reported except that in ICDAR 2011 competition [23], the Vision Object achieved correct rate of 94.33% on a competition dataset. Real-time recognition of handwritten sentences is closely connected with online handwritten character string recognition, which takes similar techniques of path evaluation and search with offline character string recognition. Our system of real-time recognition is customized from a high performance online handwritten character string recognition system by developing robust and efficient techniques for dynamic text line segmentation, character over-segmentation, updating of the candidate lattice, and real-time path search. Among the previous methods for text line segmentation in online handwritten documents, some segment text lines using heuristics or simple features like horizontal projection [24,25] and off-stroke distances [8]. The methods based on optimizing line-fitting objectives [26 28] yield more reliable line partitioning. They usually take a hypothesis-and-test strategy to generate candidate line partitioning and seek for the optimal partitioning by heuristic search. To generate text line hypotheses, however, these methods require that all the strokes have been written. On the other hand, for real-time recognition, line segmentation is performed on each stroke rather than on the whole page. Character over-segmentation in online handwritten character string recognition is often performed using off-stroke (pen lift) distances, and delayed strokes are re-arranged according to some heuristic rules [9]. For over-segmentation in real-time recognition, the rules should be designed more carefully because only part of strokes are available at dynamic segmentation. Recognition speed is another important factor in real-time recognition of handwritten sentences, where character recognition is a crucial part and consumes the majority of computing. With over 5000 classes of frequently used characters, Chinese character recognition is a difficult classification problem. The most popularly used classifiers are the modified quadratic discriminant function (MQDF) [29] and the nearest prototype classifier (NPC) [30]. The MQDF provides higher accuracy than the NPC but suffers from high expenses of storage and computation. In this paper, we will evaluate the performance of both MQDF classifier and NPC, investigating the tradeoff between recognition accuracy and speed. 3. Online handwritten character string recognition We customize a high performance online handwritten character string recognition system to real-time recognition. Before describing the real-time recognition approach, we describe the online handwritten character string recognition approach below. For the character string recognition system, we apply the integrated segmentation recognition strategy, using the same framework as illustrated in Fig. 1. In the system, the input string sample (sequence of strokes) is over-segmented and composed to be sequences of candidate characters, each denoted by X ¼ x 1...x n. Each candidate character is assigned candidate classes (denoted as c i ) by a character classifier, and then the result of character string recognition is a character string C ¼ c 1...c n. In the candidate segmentation recognition lattice, each path (X,C) is evaluated by the path evaluation criterion. In our system, we adopt the path evaluation criterion presented in [21], which is formulated from Bayesian decision view in [21], integrates multiple contexts including character classification, linguistic context, and geometric context, and shows fairly good performance. In this paper, we do not present the derivation process but give the criterion directly for saving space, and more details can be found in [21]. Denote the score of classifying character x into class c given by the character classifier as Pðc9xÞ. The linguistic context is given by a bi-gram language model, which gives the 2-gram probability, denoted as Pðc i 9c i 1 ), from character class c i 1 to c i. The unary class-dependent (uc for short) geometric score, unary classindependent (ui) geometric score, binary class-dependent (bc) geometric score and binary class-independent (bi) geometric score are denoted as Pðc9g uc Þ, Pðz p ¼ 19g ui Þ, Pðc i 1,c i 9g bc Þ, and Pðz g ¼ 19g bi Þ, respectively, where g denotes corresponding geometric feature and output scores are given by geometric models classifying on features extracted. For the ui geometric model, Pðz p ¼ 19g ui Þ indicates the probability of the character being a valid character. For the bi geometric model, Pðz g ¼ 19g bi Þ indicates the probability of the gap between two successive candidate characters being a between-character gap. The path evaluation is the combination of multiple contexts: f ðx,cþ¼ Xn i ¼ 1 fk i log Pðc i 9x i Þþl 1 log Pðc i 9c i 1 Þþl 2 log Pðc i 9g uc Þ þl 3 log Pðz p i ¼ 19g ui i Þþl 4 log Pðc i 1,c i 9g bc i Þþl 5 log Pðz g i ¼ 19g bi i Þg, ð1þ where fl 1,l 2,l 3,l 4,l 5 g are five combining weights that balance the different contributions of different models, and k i is the number of primitive segments composing the candidate character. The idea of weighting character classification score with multiplier k i follows the variable length HMM of [31]. This is to make the sum of classification scores insensitive to the path length (number of candidate characters), and enables optimal path search by DP. i

4 3664 D.-H. Wang et al. / Pattern Recognition 45 (2012) In [21], Wang et al. propose to convert the outputs of models of character classifier and geometric context into posterior probabilities by confidence transformation [12,20]. In this paper, we apply confidence transformation to multiple contexts integration. Specifically, for character classification, we use the Dempster Shafer (D S) theory of evidence [32] to combine the sigmoidal two-class probabilities into multi-class probabilities, which considers the outlier class and hence is suitable for character string recognition [20]. For geometric context models which have a small number of classes, we use the sigmoidal confidence transformation. The confidence parameters are estimated by minimizing the cross entropy (CE) loss function, which is commonly used in logistic regression and neural network training, on a validation dataset (preferably different from the dataset for training classifiers) [12]. In the following, we briefly introduce the character classifier, geometric context modeling, and combining parameters estimation Character classifier Though a large number of classifiers are available in the pattern recognition, only a few of them are effective for the large category set problem of Chinese character recognition [33]. We use the MQDF and NPC because they are among the most popularly used and effective ones, and the main aim of this paper is to propose and demonstrate a real-time handwritten sentence recognition approach instead of comprehensive comparison of classifiers. The MQDF classifier is a modified version of quadratic discriminant function (QDF), which rooted from the Bayesian classifier by assuming that the probability distribution of each class is multivariate Gaussian [29]. In the MQDF, the minor eigenvalues of each class are replaced by a constant, such that only the principal eigenvectors are used in the discriminant function. This helps reduce the computation complexity and meanwhile benefits the generalization performance. For NPC classifiers, we test two variations depending on the prototype learning algorithm: one is trained by the LOG-likelihood of Margin criterion (NPC-LOGM) [34], and one trained by One-Vs- All criterion (NPC-OVA) [35]. The training objective of NPC-LOGM is the negative Conditional Log-likelihood Loss (CLL), where the posterior probability is approximated by the logistic (sigmoidal) function of hypothesis margin. For NPC-OVA classifier, the training objective is the multi-class cross-entropy (CE) loss, where the binary posterior probability is approximated by the sigmoidal function as well. More details of the MQDF classifier, NPC-LOGM, and NPC-OVA can be found in [29,34,35], respectively Geometric context modeling Geometric context has been proven effective in character string recognition [9,10] and transcript mapping of handwritten documents [36]. Similar to geometric modeling in [36], we design four geometric models: unary and binary class-dependent models, unary and binary class-independent models. To build geometric models, we extract features for unary and binary geometry from the bounding boxes of a candidate character pattern, and from two adjacent character patterns, respectively [36]. Due to the large number of Chinese characters and the fact that many different characters have similar geometric features, we cluster the character classes into six super-classes using the EM algorithm. After clustering, we use a 6-class quadratic discriminant function (QDF) for the unary class-dependent model, and a 36-class QDF for the binary class-dependent model. For class-independent geometric models, which in essence is a twoclass classification model, we use a linear support vector machine (SVM) [37] trained with character and non-character samples for the unary class-independent model, and similarly, a linear SVM for the binary class-independent model. In path evaluation, we convert both QDF and SVM outputs to posterior probabilities via sigmoidal confidence transformation Combining parameter estimation The combining weights are learned by Minimum Classification Error (MCE) training [38,39], which has been popularly used in speech recognition and handwriting recognition [40 42]. The objective of learning the combining weights by MCE is to optimize the string recognition accuracy. In string-level MCE training, the weights are estimated on a dataset containing R string samples D x ¼fðX n,c n t Þ9n ¼ 1,...,Rg, where C n t is the ground-truth transcript of the string sample X n. Following Juang et al. [38], the misclassification measure on a string sample is approximated by dðx,lþ¼ gðx,c t,lþþgðx,c r,lþ, ð2þ where L is the parameter set, gðx,c t,lþ is the discriminant function for the truth class, and gðx,c r,lþ is the discriminant function of the closest rival class: gðx,c r,lþ¼max Ck a Ct gðx,c k,lþ. The misclassification measure is transformed to loss by the sigmoidal function: 1 lðx,lþ¼ 1þe, ð3þ xdðx,lþ where x is a parameter to control the hardness of sigmoidal nonlinearity. The parameters in MCE training are learned by stochastic gradient descent [43] on each input sample by Lðt þ1þ¼lðtþ eðtþurlðx,lþ9 L ¼ LðtÞ, where eðtþ is the learning step, and U is related to the inverse of Hessian matrix and is usually approximated to be diagonal. In MCE training for handwritten character string recognition, the discriminant function is the path evaluation criterion as (1), and the rival segmentation recognition path, which is the most confusable one with the correct one, is obtained by beam search. Substituting the discriminant functions f t and f r of the correct and rival path into (4), the parameters are updated iteratively as Lðt ¼ LðtÞ eðtþxlð1 L ¼ ¼ LðtÞ eðtþxlð1 lþðf r f c Þ: 4. Real-time recognition system ð4þ L ¼ LðtÞ The proposed real-time recognition system consists of four main modules (Fig. 2(a)): real-time segmentation recognition module, sentence recognition module, sentence edition module and language association module. While the modules of real-time segmentation recognition and sentence recognition are the core of the automatic recognition system, the other two modules are provided to facilitate user application. The real-time segmentation recognition module (Fig. 2(b)) acts whenever an ongoing stroke is produced. In line segmentation, the system judges which text line the new stroke belongs to. If the stroke belongs to one previous line, then the line is updated and character over-segmentation is performed on the line. If no previous line is found to contain the stroke, the stroke is considered to start a newlineandcomposesthefirstprimitivesegment(astrokeblock) of the line. In character over-segmentation of a text line, if the stroke belongs to one previous segment of the line, the system updates the segment, otherwise creates a new segment using the stroke and ð5þ

5 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 2. (a) Flow chart of the real-time recognition system. (b) Flow chart of the real-time segmentation recognition module. Fig. 3. (a) A candidate lattice and (b) the updated one due to a new stroke. The partial lattice with red lines are added into the previous lattice. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) finds the position of the new segment in the sequence of segments according to the left boundaries. After assigning the new stroke, the updated primitive segment or newly created segment is merged with its preceding segments to generate candidate characters, which are recognized by a character classifier to assign candidate classes. The new candidate characters and their assigned classes, as well as the linguistic context and geometric context scores associated with the new candidate characters, are added into the candidate segmentation recognition lattice. Fig. 3 shows an intermediate candidate lattice and its updated form due to a new stroke. After real-time segmentation recognition on a new stroke, if the pen lift time exceeds a threshold (adjustable by the user, e.g., 0.5 s), the result of sentence recognition is obtained by path search in the updated candidate lattice, performed by the sentence recognition module. The sentence recognition result may have errors of character segmentation or recognition. A sentence edition module is thus designed to correct such errors. Character split error can be corrected by drawing a circle embracing the split parts. Character merge error can be corrected by drawing a vertical line to separate the merged characters. After manual merge or split, the merged or split parts are re-combined into candidate characters and reassigned candidate classes, and the updated candidate lattice are re-searched for sentence recognition result. For character recognition error, candidate classes will be displayed when clicking on the character area, and the user can select the correct class. If the correct class is not in the top ranks, the user can erase the character and rewrite to activate real-time recognition. In the following, we elaborate the techniques in the modules of real-time segmentation recognition and sentence recognition. 5. Real-time segmentation recognition module On a new stroke, the real-time segmentation recognition module performs dynamic text line segmentation, character over-segmentation, and updating of the segmentation recognition candidate lattice. Algorithm 1 illustrates the real-time process of an ongoing stroke, where the Part 1 performs dynamic text line segmentation, and the Part 2 performs dynamic character over-segmentation. Afterwards, candidate characters are generated from the updated primitive segment, and assigned candidate classes to update the candidate lattice.

6 3666 D.-H. Wang et al. / Pattern Recognition 45 (2012) Algorithm 1. Real-time process of an ongoing stroke. Input: Existing lines and segment sequence: lines, segments Line number: m A new stroke: strk Initialization: set lineidx ¼ 1 // Part 1: dynamic line segmentation For i¼m to1 feature¼linestrokefeature(strk,line i ), Classifier(feature), if strk belongs to line i lineidx¼i; break; else continue; End for. // Part 2: dynamic character over-segmentation If (lineidx40) Merge strk into the lineidx-th line, OverSegmentation(updated lineidx-th line) Else Create a new line using strk, Create the first segment of the line using strk. m ¼ mþ1. End if. Update the segment sequence, // Part 3 Generate candidate characters, Update the candidate lattice. End. continues until one line containing the stroke is found or all the previous lines have been considered. If there is no line containing the stroke, the stroke forms a new line. We adopt a statistical classifier to model the geometric relationship of a line stroke pair, and to judge whether the stroke belongs to the line or not. To collect training samples for the two-class classifier, we extract samples from a stroke and its temporally previous lines. If the stroke belongs to the line, the sample is considered to be a positive one, otherwise a negative one. Samples can be extracted from ground-truthed online documents containing multiple text lines. Each positive or negative sample (a line stroke pair) is extracted geometric features for training classifier. For extracting geometric features from a line stroke pair, we do not rely on temporal features such as the off-stroke distance so as to cope with delayed strokes. Before feature extraction, the line line and the stroke strk are tentatively merged and fitted by linear regression. Denote the merged line as line t. The line height is estimated by computing the average height of strokes. We extract 22 features from the line stroke pair, as listed in Table 1. The features can be divided into four categories: (1) five features related to the line line (No. 1 5 in Table 1); (2) two features related to the stroke strk (No. 6 and 7); (3) four scalar features related to the line line t (No. 8 11); (4) 11 scalar features related to the geometric relationship between the stroke strk and the line line as well as the line line t (No in Table 1). The estimated line height of a character string is important in extracting line stroke geometric feature and segment stroke feature. To estimate the line height (denoted as linehei in this paper) robustly, all the strokes in the line are first sorted in ascending order of height, and the half of strokes with larger heights are used to estimate the line height (average of the heights of selected strokes). While writing proceeds, the estimate is updated incorporating the new stroke. The estimate becomes more accurate when the number of strokes increases Dynamic line segmentation This step is to assign a new stroke (denoted as strk) into one of m previous lines (denoted as lines) or start a new line. In the algorithm, lineidx is the index of the text line that the new stroke belongs to, and lineidx ¼ 1 indicates that the stroke starts a new line. The function LineStrokeFeature(strk, line i ) extracts geometric features characterizing the relationship between the stroke and the i-th line. Based on the features, if the classifier judges that strk belongs to line i, then update line i and perform over-segmentation on the updated line i. Otherwise, the process Fig. 4. Segment sequence of multiple lines. Table 1 Line stroke geometric features (the last column denotes whether normalized w.r.t. line height or not). No. Feature Norm 1 2 Height and width of line Y 3 The number of strokes in line N 4 Average regression error of line: s 2 1 Y 5 Horizontal direction of the line line N 6 Height of strk Y 7 Aspect ratio of strk N 8-9 Height and width of line t Y 10 Average regression error of line t : s 2 2 Y 11 Horizontal direction of the line line t N 12 Growth of line height Y 13 Change of horizontal direction N 14 Change of average regression error Y 15 Distance between line and strk, as the minimum distance between strk and the strokes in line Y 16 Common area of line and strk Y Distances of upper/lower bound of strk to vertical center of line along the norm direction of line Y Distances between the upper bounds, lower bounds, upper-lower bounds, and lower-upper bounds of line and strk Y

7 D.-H. Wang et al. / Pattern Recognition 45 (2012) Dynamic character over-segmentation After text line segmentation, dynamic character over-segmentation is then performed to update the sequence of primitive segments. In our system, the segments of multiple lines are ordered in one sequence, as depicted in Fig. 4. In the case that the ongoing stroke forms a new line, the stroke composes the first segment of the line which is considered as the last segment in the sequence. If the stroke is judged to belong to a previous line, dynamic character over-segmentation is performed on the updated line, by the function OverSegmentation(updated line- Idx-th line) in Algorithm 1, as detailed in Algorithm 2. The Algorithm 2 aims to locate the segment the stroke belongs to in the text line L, similar to the line segmentation algorithm as the Part 1 of Algorithm 1. Suppose there are n previous segments in L. In Algorithm 2, segidx denotes the index of the segment that the new stroke belongs to, and segidx ¼ 1 indicates that the stroke starts a new segment. The function SegStrokeFeature(strk, s i ) extracts geometric features characterizing the relationship between the stroke and the i-th segment. Based on the features, the output of a classifier is transformed to a confidence measure which indicates the probability of the stroke belonging to the segment. If the confidence is greater than a threshold g, strk is considered to belongs to s i and is merged into s i. Otherwise, the process continues until one segment containing the stroke is found. If there is no segment containing the stroke, the stroke will be considered to start a new segment. The threshold g should be large enough to avoid merge errors in over-segmentation (g is safely set as 0.85 empirically in our system). After assignment of the stroke, the updated segment sequence of the line is sorted according to the left boundaries, performed by the function SortSegments(s 1 s 2 s n s n þ 1 ). Algorithm 2. Dynamic character over-segmentation. Input: updated line : L segment number: n previous segments: s 1 s 2 s n A new stroke: strk Initialization: set segidx ¼ 1 For i¼n to 1 feature¼segstrokefeature(strk, s i ), confidence¼classifier(feature), if (confidence4g) //strk belongs to s i segidx¼i; Fig. 5. Examples of segment sequence with a new stroke inserted. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) break; else continue; End for. If (segidx40) Merge strk into the segidx-th segment, Else Create a new segment using strk, SortSegments(s 1 s 2 s n s n þ 1 ), n¼nþ1. End if. End. Fig. 5 shows the two main cases of dynamic character oversegmentation on a new stroke, where the segment with red frame indicates a newly created or an updated segment. Case A shows normally writing strokes where the stroke is written in the end of the line, while Case B shows delayed strokes inserted to previous parts. Delayed strokes also happen when a character is deleted in user edition and a new character is re-written in the same position. In Case B, if the stroke starts a new segment, the position of the newly created segment is located according to the left boundaries. For robust over-segmentation, we also adopt a statistical classifier to model the geometric relationship of a segment stroke pair. To collect training samples of positive and negative segment stroke pairs, we first segment the text lines of training data into primitive segments according to the off-stroke distance and then re-arrange delayed strokes using spatial information. Each stroke is paired with its temporally preceding segments to form segment pair samples, which are positive or negative samples depending on the stroke really belongs to the segment or not. Similar to feature extraction for line segmentation, we do not rely on temporal features so as to cope with delayed strokes. The geometric features of a segment stroke pair include 12 features in total: three features related to the stroke strk (No. 1 3 in Table 2), two features related to the segment s i (No. 4 and 5), two features related to the temporally merged segment s i t (No. 6 and 7), and five scalar features related to the relationship between strk and s i (No. 8 12). The horizontal overlap between strk and s i, which is important for character over-segmentation, is characterized by horizontal relationship between them as the features No Candidate characters generation On dynamic line segmentation and character over-segmentation, generation of candidate characters is straightforward. Fig. 6 shows examples of new candidate characters. The segment with red box indicates the one updated or formed by a new stroke, and the blue frame embraces the candidate characters that start from or end at the red segment. In this paper, the maximum number of segments composing a candidate character is denoted as SN. Table 2 Segment stroke geometric features (the last column denotes feature value normalization w.r.t. the line height). No. Feature Norm 1 2 Height and width of strk Y 3 Aspect ratio of strk N 4 5 Height and width of s i Y 6 7 t Height and width of the temporally merged segment s i Y 8 Common area of bounding boxes of strk and s i Y 9 Horizontal gap between bounds of strk and s i Y Distances between the left bounds, right bounds, horizontal centers of strk and s i Y

8 3668 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 6. Examples of new candidate characters. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) The generation of candidate characters is subject to some heuristic rules for reducing the number of candidate characters while guaranteeing including true characters: (a) the number of segments in a character does not exceed the maximum number SN; (b) segments in different lines are not combined to candidate character; (c) candidate characters with width larger than a threshold (safely set as 3 linehei in our system) are pruned; (d) two successive segments with horizontal distance larger than a threshold (safely set as 2 linehei in our case) are not allowed to be merged Candidate lattice updating After over-segmentation and candidate characters generation, the candidate character classes and their scores, the linguistic context and geometric context involving the newly generated candidate characters are obtained using the character classifier, language model and geometric model, respectively, and are added into and updated in the candidate lattice. To roughly estimate the computation cost on a new stroke, we consider the costs of feature extraction and classification in character recognition and geometric context scoring, as well as getting the linguistic scores. Denote the number of candidate classes for each candidate character maintained in the candidate lattice as CN and the number of newly generated candidate character patterns as PN (Pattern Number). In normal writing order as in the Case A of Fig. 6, the maximum number of candidate characters is PN¼SN (some candidate characters may violate the conditions (b) (d) and will be pruned). When a delayed stroke is written as in the Case B of Fig. 6, where the delayed stroke is in the pos-th segment, the number of candidate characters composed of k segments containing the pos-th segment is k (start segment from ðpos kþ1þ-th to pos-th). Consider candidate characters composed of 1; 2,...,SN segments, the maximum number of candidate characters associated with the pos-th segment is PN ¼ 1þ2þ þsn ¼ SN ðsn þ1þ=2. The cost of character classification (including character feature extraction and classification) is proportional to the number of candidate character patterns PN. For linguistic context given by the character bi-gram, there is no feature extraction but retrieving the value for pairs of successive characters from the lexicon. For each candidate class of a candidate character, there are at most SN CN preceding classes and SN CN succeeding classes in the candidate lattice. Remember that the maximum number of candidate characters associating the current segment is PN and the maximum number of candidate classes is PN CN, the cost of retrieving language model is proportional to f2 PN CN ðsn CNÞg (here PN ¼ SNðSN þ1þ=2) for Case B and fpn CN ðsn CNÞg (here PN¼SN) for Case A (which has predecessors only). Updating the binary class-dependent geometric context is similar to that for linguistic context except that retrieving bi-gram is replaced by geometric feature classification. The binary class-independent geometric context and the unary geometric contexts cost less computation than the binary class-dependent geometric context because they have less geometric classes. Since the candidate character classification and geometric context scoring cost majority of computation, it is beneficial to make them computed in real time during the writing process. After the writing of a sentence is finished, the sentence recognition only has to search the candidate lattice. 6. Sentence recognition (path search) Sentence recognition is to search the candidate lattice for the optimal segmentation recognition path. Due to the summation nature of the path evaluation criterion of Eq. (1), the dynamic programming (DP) algorithm can be adopted for optimal path search. We further apply the beam search strategy to accelerate DP search by pruning the partial paths at intermediate nodes. The search algorithm is suitable for real-time recognition because the retained multiple partial paths can be extended for further path search when ongoing strokes are continually produced. The adopted beam search algorithm is similar to that used in [21], but we implement it in a different way for efficient updating of candidate lattice in real-time recognition. The DP search algorithm is similar to the forward procedure in the Viterbi decoding algorithm [44]. After character over-segmentation, sentences of multiple lines are represented as a sequence of primitive segments ox 1 x 2,...,x T 4 where T is the total number of segments in the sequence. A candidate pattern consisting of s segments and ending at t-th segment is denoted as x t s þ 1,t (1rsrSN). If we assign the c-th candidate class (1rcrCN) to the candidate pattern, we get one single path from the (t sþ1)-th segment to the t-th segment, denoted as (t,s,c). Denote candidate paths ending at t-th segment as P t, in which one single path is denoted as p t. Then the forward variable can be defined as f t,s,c ¼ max p t s: p t s A Pt s f ðp t s,ðt,s,cþþ, i.e., f t,s,c is the best score (highest probability) along a single path ending at the (t s)-th segment, extended with a candidate character ending at the t-th segment and associated with class c. The beam search strategy accelerates DP by pruning candidate paths: among the candidate paths ending at the (t s)-th segment, we retain the BW (Band Width) top ranked paths and prune the others. Then we can search the optimal path ending at t-th segment inductively as follows: Algorithm 3. Beam Search in frame-synchronous fashion. (1) Initialization ( f 1,s,c ¼ f 1,s,c, s ¼ 1; 1rcrCN 0, sz2; 1rc rcn ð6þ

D.-H. Wang et al. / Pattern Recognition 45 (2012) 3661 3675 3669 If CN 4BW, retain BW top ranked paths ending at the first segment; Otherwise, retain the CN paths.

9 D.-H. Wang et al. / Pattern Recognition 45 (2012) If CN 4BW, retain BW top ranked paths ending at the first segment; Otherwise, retain the CN paths. (2) Induction f t,s,c ¼ max ff top BWðs 0,c 0 t s,s Þ 0,c þk logðc9x 0 t s þ 1,tÞþl 1 log Pðc9c 0 Þ þl 2 log Pðc9g uc Þþl 3 log Pðz p ¼ 19g ui Þ þl 4 log Pðc 0,c9g bc Þþl 5 log Pðz g ¼ 19g bi Þg: Retain BW top ranked paths ending at the t-th segments. (3) Termination f T ¼ max ðs,cþ f T,s,c: (4) Backtracking. Step (1) initializes the candidate paths containing the first segment as the candidate character pattern, using the character classification scores, linguistic context score, unary class-dependent and unary class-independent geometric context scores. The induction step, which is the heart of the algorithm, is to search the optimal path for each triplet ðt,s,cþ based on previous optimal partial paths ending at the (t s)-th segment, using multiple contexts that have been computed when updating the candidate lattice during writing. In the termination step, the optimal complete path is chosen from paths ending at the last segment, and the character segmentation and recognition results are obtained in the backtracking step. In the induction step, the maximum number of candidate paths ending at the (t s)-th segment is SN CN, among which the optimal one is chosen as the preceding path of ðt,s,cþ. When BW equals to SN CN, the search process is the same as the DP algorithm. When BW osn CN, the search process is accelerated. From the algorithm, we can see that the path search is framesynchronous (also called as time-synchronous in speech recognition, partial paths are updated segment by segment), and the DP algorithm guarantees finding the optimal path for context models up to order 2. Now that the beam search algorithm updates the optimal partial paths ending at a segment from the retained partial paths ending at the previous segments, it enables path extension when the candidate lattice is updated on new strokes. Suppose in realtime recognition, the position of the updated or newly created segment on a new stroke is pos, the system performs beam search from the candidate paths ending at the preceding segments of the pos-th segment, and extend to the succeeding segments if the new stroke is a delayed stroke. 7. Experiments We evaluated the performance of the proposed real-time recognition approach on a database of online Chinese handwriting: CASIA-OLHWDB [13]. This database is divided into six datasets, three for isolated characters (DB ) and three for handwritten texts (DB ). There are 3,912,017 isolated character samples and 52,221 handwritten pages (consisting of 1,348,904 character samples) in total. Both the isolated data and handwritten text data have been divided into standard training and test subsets. Though the handwritten text data has been produced ahead of time, we can utilize the temporal stroke order to simulate the real-time writing process for evaluating real-time recognition performance. In sentence-based input, due to the limited space of writing area of mobile computers, users tend to write multiple text lines and each line contains only a few (mostlyo10) characters. To simulate this situation, we used the datasets DB (called Fig. 7. (a) A handwritten text page; (b) three short pages generated from the first three lines. Table 3 Statistics of DB2 and generated short page dataset GDB2. Datasets #Page #Line #Line/page #Characters #Chars/line DB2 Train , ,082, Test , , GDB2 Train 41, , ,082, Test 10,510 38, , DB2 for short) to generate short text pages each with three to six text lines, each line consisting six to eight characters. In DB2, a text line typically contains characters because it was written on A4 paper using digital pen. We split each line into multiple lines as in a short page by making the width of each line not larger than five times of the average height of the original lines. Fig. 7 shows an example of data generation: (a) is a handwritten text page, and (b) shows three short pages derived from the first three lines in (a). Table 3 provides the details of dataset DB2 and the generated short page dataset (called GDB2 for short). The total number of strokes in the test set is 987,027. To evaluate the real-time recognition performance on short pages with delayed strokes, we produced delayed strokes in GDB2 by changing the writing order of a stroke in each page. Specifically, we randomly chose a stroke and place it randomly after its original position. The generated short page dataset with delayed strokes is called GDB2-D for short Experimental setup For dynamic line segmentation and character over-segmentation, we used a linear SVM classifier to model the geometric relationship of line stroke pair and segment stroke pair, respectively, and trained the classifier on features extracted from the training set of text lines of GDB2-D. We evaluated the recognition performance using the three character classifiers introduced in Section 3.1: MQDF, NPC-LOGM, and NPC-OVA. The classifier parameters were learned on 4/5 of training character samples (both the isolated characters in the training set of DB1 and the segmented characters in the training set of DB2, 4,207,801 samples in total), and the remaining 1/5 training samples were used for confidence parameter estimation. The training character samples fall in 7356 classes, including

10 3670 D.-H. Wang et al. / Pattern Recognition 45 (2012) Chinese characters and 171 alphanumeric characters and symbols. For character feature extraction, we use the local stroke direction histogram feature, which has been popularly used in both online and offline handwritten character recognition. Particularly, we adopt the implementation method of [45] for direction feature extraction using bi-moment normalization. After direction decomposition, 8 8 feature values are extracted from each of eight direction planes. To reduce the complexity of the classifier, the 512D feature vector is projected onto a 160D subspace learned by Fisher linear discriminant analysis (FLDA). The character bi-gram language model was trained on a text corpus containing about 50 million characters (about 32 million words) [16]. To estimate the parameters of the geometric models and train the combining weights of path evaluation criterion, we simulated the real-time character over-segmentation process on the training text lines of GDB2-D. In simulation, a text line is over-segmented into primitive segments using the dynamic over-segmentation algorithm. On the segment sequence, we extracted samples of geometric features for geometric context modeling. Using the character classifier, language model and geometric context models, we then constructed the candidate lattice on the segment sequence and trained the combining weights by MCE. Table 4 shows some statistics of character samples segmented from the test pages of DB2. The rec row gives the correct rate of segmented character recognition by character classifiers, and rec10 and rec20 are the cumulative accuracies of top 10 and 20 ranks, respectively. We can see that for all three classifiers, the correct rate of Chinese characters is the highest among four character types, and the MQDF classifier is the highest for Chinese characters among three classifiers. Comparing the overall correct rate, however, the NPC-LOGM classifier is the highest because it performs much better on symbols. The non-characters are abnormal samples and labeled as non-characters in the database, and outliers are the characters out of the defined 7356 classes. Our experiments were implemented on a PC with Intel(R) Core(TM) 2 Duo CPU E GHz processor and 2 GB RAM, and were programmed using Microsoft Visual Cþþ Performance metrics We use some performance metrics for dynamic line segmentation and real-time sentence recognition, respectively. Many metrics have been defined for evaluating performance of line segmentation [10,46,47]. We adopt some of them and define a new metric for performance of real-time line segmentation. These metrics are based on the definitions of matches. A one-toone match is a match where a detected line and a ground-truthed line contain identical strokes. And g-one-to-many match occurs when the union of two or more detected lines equal to a groundtruthed line. Similarly, a d-many-to-one match means the union of two or more ground-truthed lines equals a detected line. Among the performance metrics presented in [28], we chose the detection rate (DR), recognition accuracy (RA) and entity detection metric (EDM): DR ¼ w 1 one2one N g_one2many þw 2, N one2one RA ¼ w 3 M þw d_many2one 4, M 2 DR RA EDM ¼ DRþRA, where N is the number of ground-truthed lines, M is the number of detected lines, and w 1 w 4 are all set to 1. DR, RA and EDM are similar to the recall, precision and F-rate, respectively. Page recognition rate (PRR), defined as the percentage of pages with no segmentation error, is used to measure the page level performance. To evaluate the performance of real-time sentence recognition, we use two character-level metrics [15,21]: Correct Rate (CR) and Accurate Rate (AR): CR ¼ðN t D e S e Þ=N t, AR ¼ðN t D e S e I e Þ=N t, where N t is the total number of characters in the ground-truth transcript. The numbers of substitution errors (S e ), deletion errors (D e ) and insertion errors (I e ) are calculated by aligning the recognition result string with the transcript by dynamic programming. The metric CR denotes the percentage of characters that are correctly recognized. Further, the metric AR considers the number of characters that are inserted due to over-segmentation. For real-time recognition of handwritten sentences, besides the CR and AR, the recognition speed is of crucial importance for practical applications. We evaluated the speed using the CPU times of each separate step as well as the whole process: line segmentation (denoted as C l ), character over-segmentation (denoted as C o ), candidate lattice updating (computing multiple contexts in the candidate lattice, denoted as C u ), path search (denoted as C s ), and the whole recognition process (denoted as C w ). In C u, we also count the times of character classification, geometric context scoring, and linguistic context scoring, respectively. The time cost is averaged over the total number of strokes, since in real-time recognition, each step performs once on each stroke Performance of dynamic line segmentation We evaluated the performance of real-time line segmentation on simulated short text pages. Given an online page, whenever a stroke is input, the system performs line segmentation and Table 4 Statistics of character types and recognition rates. Classifier All Chinese Symbol Digit Letter Non-char Outlier Number 269, ,078 26,753 6, MQDF Rec(%) Rec Rec NPC-LOGM Rec(%) Rec Rec NPC-OVA Rec(%) Rec Rec

11 D.-H. Wang et al. / Pattern Recognition 45 (2012) Table 5 Performance of dynamic line segmentation on GDB2 and GDB2-D. Dataset DR RA EDM PRR GDB GDB-D Table 6 Recognition accuracies with dynamic line segmentation. Table 8 CPU times (ms) in updating the candidate lattice. Dataset Classifier C u : Char Geo Lng GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Dataset Classifier CR (%) AR (%) ch (%) sb (%) dg (%) lt (%) GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Table 7 CPU times (ms) of sentence recognition with dynamic line segmentation. Table 9 Recognition accuracies with ground-truth line segmentation. Dataset Classifier CR (%) AR (%) ch (%) sb (%) dg (%) lt (%) GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Dataset Classifier C l C o C u C s C w GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA updates text lines. After the last stroke is processed, the result of line segmentation is obtained. In our previous work [48], we used a linear SVM classifier to characterize the geometric relationship of a line stroke pair, and evaluated the performance on GDB2.1-D (a subset of GDB2-D). The comparison with other related methods (including the ones based on off-stroke distance and overlap of the line stroke pair) has shown the robustness and effectiveness of the proposed algorithm. Hence, in this paper, we only show the performance of the proposed algorithm on GDB and GDB2-D in Table 5, without comparing with the other methods. From the results, we can see that the algorithm performs well and can deal with delayed strokes Performance of real-time sentence recognition We evaluated the performance of real-time recognition on the generated short page datasets without (GDB2) and with delayed strokes (GDB2-D). On dynamic text line segmentation and candidate lattice updating on each stroke, the sentence recognition result is found using the real-time beam search algorithm with default parameter setting SN¼6, CN¼10 and BW¼10 (these parameter values were found to give good tradeoff between recognition accuracy and speed) Performance with dynamic line segmentation In this case, the sentence recognition result is obtained based on dynamic text line segmentation, so the line segmentation error will cause sentence recognition error. The recognition accuracies (CR and AR) on the test data of GDB2 and GBD2-D are listed in Table 6, and the CPU times are given in Table 7. InTable 6, theperformanceis also specified to different types of characters: Chinese characters (ch), symbols (sb), digits (dg) and letters (lt). In Table 7,thetimecost is specified to different steps: line segmentation (C l ), character oversegmentation (C o ), candidate lattice updating (C u ), path search (C s ), and whole process (C w ). Further, the CPU time in candidate lattice updating is specified to character classification (Char), geometric context scoring (Geo), and linguistic context scoring (Lng), and is given in Table 8. From the results, we have some observations as follows: (a) The character correct rate of sentence-based input is significantly higher than that of isolated character recognition (in Table 4) though sentence-based recognition involves segmentation. This is due to the important effect of contexts. (b) For all the three character classifiers, the correct rates on Chinese characters are fairly high (94.09%, 91.22%, and 89.46% for MQDF, NPC-LOGM, and NPC-OVA, respectively), but the correct rates of symbols, digits and letters are much lower. This is because the shapes of symbols, digits and letters are more likely to be confused. (c) The correct rate on handwritten text data with delayed strokes is only slightly lower than that on data without delayed strokes. This demonstrates the robustness of the proposed approach against delayed strokes. (d) Among the three character classifiers, the MQDF classifier gives the highest overall correct rate and accurate rate. This is due to its advantage on Chinese characters, which takes advantage of the linguistic context to improve the recognition accuracy. (e) Tables 7 and 8 show that the major source of computation cost lies in the updating of candidate lattice, and further, character classification costs most of computation time in candidate lattice updating. Comparing the three classifiers, the MQDF classifier is most computationally intensive, and it makes the whole recognition process much slower than the NPC classifiers. (f) The proportion of time cost of path search (C s ) in the whole recognition process is very small. This favors real-time applications, because the majority of computation is performed during the writing process, and the sentence recognition results can be obtained immediately by path search after writing is finished. (g) Although the MQDF classifier is computationally intensive, it is acceptable for real-time applications because the majority of computation is done during writing. From the tradeoff

3672 D.-H. Wang et al. / Pattern Recognition 45 (2012) 3661 3675 between accuracy and speech, the NPC-LOGM classifier is preferable. 7.4.2. Performance with ground-truth line segmentation In this case, the ground truth of text line segmentation is taken such that sentence recognition is not disturbed by line segmentation error.

12 3672 D.-H. Wang et al. / Pattern Recognition 45 (2012) between accuracy and speech, the NPC-LOGM classifier is preferable Performance with ground-truth line segmentation In this case, the ground truth of text line segmentation is taken such that sentence recognition is not disturbed by line segmentation error. This is to evaluate the performance of pure sentence recognition as if the sentence is always written in a single line. The recognition accuracies are shown in Table 9. Compared to Table 6 we can see that the recognition accuracies do not differ significantly between ground-truth line segmentation and dynamic line segmentation. This is because the dynamic line segmentation algorithm yields very few errors Effects of parameters in path search Both the recognition accuracy and speed depend on the parameters SN (segment number) and CN (candidate class number) in candidate lattice updating, and BW (band width) in beam search. In the above experiments, the parameters were set default values (SN¼6, CN¼10, BW¼10). The choice of SN is related to the concrete candidate character generation method. In our system, SN¼6 was chosen to guarantee that nearly all true characters can be generated by combining consecutive segments. Under this situation, we evaluate the effects of the other two parameters CN and BW. Experimental results with CN¼10 and various BW are shown in Fig. 8. When BW equals SN CN (60 in this case), the performance is the same as that of the DP algorithm. We do not show Fig. 8. Experimental results with various BW. (a) Correct rate of MQDF and NPC-LOGM classifier. (b) Recognition speed of MQDF and NPC-LOGM classifier. Fig. 9. Experimental results with various CN. (a) Correct rate of MQDF and NPC-LOGM classifier. (b) Recognition speed of MQDF and NPC-LOGM classifier. Fig. 10. Examples of line segmentation error. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

13 D.-H. Wang et al. / Pattern Recognition 45 (2012) the results of NPC-OVA classifier since the correct rate of which is lower than NPC-LOGM classifier while their recognition speed is the same. From the results, we can see that, when BW equals 10, the recognition accuracy is as high as when BW equals 60, and various BW makes little difference in recognition speed. Performances with BW¼10 and various CN are shown in Fig. 9. We can see that the number of 10 candidate classes performs sufficiently well, and increasing CN improves correct rate slightly but brings substantial computation cost Examples of recognition errors The sources of real-time sentence recognition errors include the text line segmentation error, character over-segmentation failure (under-segmentation), character classification error, and path search failure. The error rate of dynamic text line segmentation is very low as shown in Table 5. Some examples of line segmentation error are shown in Fig. 10, where the stroke in red is the succeeding stroke written after the first line. This indicates that line segmentation error likes to happen in the beginning of writing because the line height is not precisely estimated due to the small number of strokes, as well as the red stroke is rather apart from the first line. But this line segmentation error does not affect sentence recognition significantly because the segment sequence is correctly ordered. As the writing continues, the temporary line segmentation error may be corrected after the line becomes longer. Over-segmentation failure happens when a segment contains strokes that belong to different characters or connected strokes are written. Character classification error means that the true class of the candidate character is not included in the top CN candidate classes. This makes the correct path not included in the candidate lattice. The path search failure occurs when the correct path, even though included in the candidate lattice, cannot be searched by the path search algorithm due to the imperfection of path evaluation criterion or search algorithm. We show three examples of recognition errors in Fig. 11: (a) shows a symbol that is misrecognized, (b) shows a Chinese character recognition error, and (c) shows a segmentation error. In real-time string recognition, the correct rates of symbols, letters and digits are quite low, as have been shown in Table 6. Itisthegoaltoimprove the accuracy on alphanumerics and symbols in the future Example of real-time recognition process We have developed a prototype system on Tablet PC to demonstrate the applicability of the proposed real-time recognition approach. User edition functions have also been developed for correcting segmentation and recognition errors, though they are not described in this paper. Fig. 12 shows some sampled steps of real-time recognition on a handwritten sentence, where the left subwindow is the writing area and the right sub-window shows the recognition result. Whenever the pen lift time exceeds a threshold, the sentence recognition result will be shown and the user has the Fig. 11. Three examples of recognition errors. (a) A symbol is misrecognized; (b) A Chinese character is misrecognized; (c) A segmentation error. Upper, segments after over-segmentation; middle, segmentation recognition result; bottom, ground-truth. Fig. 12. Steps of real-time recognition on a short page.

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin