Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect. Pattern Recognition

Size: px
Start display at page:

Download "Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect. Pattern Recognition"

Transcription

1 Pattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: An approach for real-time recognition of online Chinese handwritten sentences Da-Han Wang a, Cheng-Lin Liu a,n, Xiang-Dong Zhou b a National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguan East Road, Beijing , PR China b Intelligence Engineering Lab & Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, P.O. Box 8718, Beijing , PR China article info Article history: Received 10 January 2012 Received in revised form 24 March 2012 Accepted 18 April 2012 Available online 30 April 2012 Keywords: Online Chinese handwritten sentence recognition Real-time recognition Dynamic text line segmentation Dynamic over-segmentation Dynamic candidate lattice Path search abstract With the advances of handwriting capturing devices and computing power of mobile computers, penbased Chinese text input is moving from character-based input to sentence-based input. This paper proposes a real-time recognition approach for sentence-based input of Chinese handwriting. The main feature of the approach is a dynamically maintained segmentation recognition candidate lattice that integrates multiple contexts including character classification, linguistic context and geometric context. Whenever a new stroke is produced, dynamic text line segmentation and character over-segmentation are performed to locate the position of the stroke in text lines and update the primitive segment sequence of the page. Candidate characters are then generated and recognized to assign candidate classes, and linguistic context and geometric context involving the newly generated candidate characters are computed. The candidate lattice is updated while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition. Since the computation of multiple contexts consumes the majority of computing and is performed during writing process, the recognition result is obtained immediately after the writing of a sentence is finished. Experiments on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the robustness and effectiveness of the proposed approach. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction With the proliferation of pen-based and touch-based mobile computers, online handwriting recognition has many potential applications [1 4], including text input, handwritten notes and diagrams recording, signature verification, and mathematical expressions recognition [5]. Character recognition-based Chinese text input has been widely applied in Chinese market. However, as the handwriting capturing devices and computing power of mobile computers advances, sentence-based text input becomes possible. Compared to character-based input, sentence-based input is more natural and enables faster and more accurate input via handwritten sentence recognition incorporating contexts. Handwritten sentence (character string) recognition is a difficult contextual classification problem involving character segmentation and recognition [2,3]. There have been many efforts towards the improvement of handwritten character string recognition [6 11]. Most methods adopt the integrated segmentation recognition strategy to overcome the ambiguity of character segmentation. In the segmentation recognition framework, handwritten text is first n Corresponding author. Tel.: þ addresses: dhwang1983@yahoo.com.cn, dhwang@nlpr.ia.ac.cn (D.-H. Wang), liucl@nlpr.ia.ac.cn (C.-L. Liu), xiangdongzhou@foxmail.com (X.-D. Zhou). over-segmented into primitive segments which can be a character or a part of a character. Then candidate character patterns are generated by concatenating consecutive segments, and are recognized by a character classifier to assign candidate classes. The candidate character sequence and assigned candidate classes are represented in a segmentation recognition candidate lattice, which contain many segmentation recognition paths each corresponding to one recognition result. The optimal path of segmentation recognition is searched from the candidate lattice via path evaluation combining character classification scores and contexts. Fig. 1 shows a typical handwritten text recognition system (Fig. 1(a)), and an illustrative example of over-segmentation and the segmentation recognition candidate lattice (Fig. 1(b)). The above methods, though show promise, perform character segmentation and recognition after the sentence writing is finished. To achieve real-time recognition, character segmentation and recognition should be performed during the writing process, such that the result can be obtained immediately after the completion of writing. In recent years, some real-time handwriting input (dynamic recognition during writing) products have been developed, but we have not seen an academic study addressing this problem theoretically or experimentally. Besides character string recognition, real-time recognition of handwritten sentences also involves text line segmentation, since sentences are often written in multiple lines due to the limited /$ - see front matter & 2012 Elsevier Ltd. All rights reserved.

2 3662 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 1. (a) A typical handwritten text recognition system. (b) An illustrative example of over-segmentation and the segmentation recognition candidate lattice. Each box contains the candidate character (upper) and its candidate classes (lower). The optimal path is denoted by thick line with red characters (left one in each box) being the correct result. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) space of writing area. Text line segmentation in real-time recognition is difficult because the lines are short, the strokes are dynamically produced, and there are often delayed strokes, which are inserted into previous characters or even previous lines. Unlike previous text line segmentation methods that mostly group strokes into lines after all strokes are produced, the dynamic segmentation during writing can only utilize the information of part of strokes. In this paper, we propose an approach to real-time recognition of Chinese handwritten sentences using a dynamically maintained segmentation recognition candidate lattice. Whenever a new stroke is produced, dynamic line segmentation and character over-segmentation are performed on the stroke to update the primitive segment sequence and locate the position of the segment in text lines. Then candidate characters are generated on the new stroke, and are recognized to assign candidate classes. Meanwhile, multiple contexts including linguistic context and geometric context involving the newly generated candidate characters are computed using language model and geometric models. The candidate lattice is updated constantly while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition by the path search algorithm as in conventional character string recognition. Since the updating of the candidate lattice consumes the majority of computing and is performed during writing process, sentence recognition is obtained immediately after a long pen lift. Based on automatic recognition, we can develop some editing functions to manually correct segmentation and recognition errors to facilitate user applications. For dynamic text line segmentation in real-time recognition, we propose to adopt a statistical classifier to model the geometric relationship between the ongoing stroke and the existing text lines. By classification based on extracted features of a line stroke pair, the classifier judges whether to assign the stroke to a previous line or it starts a new line. The method can deal with delayed strokes by grouping them into previous lines, and therefore, it makes the real-time recognition system more robust. For dynamic character over-segmentation, we also use a statistical classifier to model the geometric relationship between the ongoing stroke and existing primitive segments that belong to the same line of the stroke. We transform the output of the classifier on extracted features of a segment stroke pair into posterior probability by confidence transformation [12], which indicates the probability of the stroke belonging to the segment. The stroke is considered to belong to the segment if the probability is greater than a threshold. By testing each segment stroke pair, the stroke is assigned to one existing segment or starts a new segment. The position of the segment in the sequence of segments is located according to their left boundaries. Similar to dynamic text line segmentation, the over-segmentation can also deal with delayed strokes. For path search after candidate lattice construction, we propose a real-time beam search algorithm for real-time recognition. The beam search algorithm is an accelerated version of the dynamic programming (DP) algorithm by pruning the partial paths at intermediate nodes. Via retaining partial optimal paths ending at each segment, we perform search from the updated segment other than from the start segment. We evaluated the performance of the proposed approach in respect of the recognition accuracy and recognition speed on a large database CASIA-OLHWDB [13] of unconstrained online Chinese handwritten characters and texts, and the results demonstrate the robustness and effectiveness of the proposed approach. The rest of this paper is organized as follows. Section 2 reviews the related works. Section 3 describes the baseline character string recognition method that we customize for real-time recognition. An overview of the real-time recognition system is provided in Section 4. Section 5 presents the methods for dynamic text line segmentation, dynamic character over-segmentation, and candidate lattice updating. The real-time path search algorithm is described in Section 6. Section 7 presents the experimental results, and Section 8 offers concluding remarks. This paper is an extension to our previous conference paper [14] by elaborating the procedures of dynamic line segmentation, dynamic character over-segmentation, and candidate character generation, incorporating geometric context into the path evaluation criterion, optimizing the combining weights, and evaluating the system quantitatively on a large database of online handwriting. 2. Related works Chinese handwritten character string recognition is a challenging problem due to the large character set, the diversity of writing styles, the character segmentation difficulty, and the unconstrained language domain. Particularly, due to the variability of character size and position, character touching and overlapping, the characters cannot be reliably segmented prior to character recognition. To overcome the large number of character classes and the infinite

3 D.-H. Wang et al. / Pattern Recognition 45 (2012) sentence classes of Chinese texts, over-segmentation-based character string recognition approaches are commonly used [1]. Under the integrated segmentation recognition framework, a lot of efforts have been devoted to the key techniques in Chinese/ Japanese handwritten character string recognition. In the framework, the criterion for evaluating candidate segmentation recognition paths usually integrates multiple contexts including the character classification, linguistic context and geometric context. Among previous works, some integrated incomplete contexts [15 17], and some combined the contexts heuristically without optimizing the combining weights [8,9,18,19]. Zhou et al. optimize the combining weights using the conditional random field (CRF) model [10], which is hard to incorporate language models of higher order than the bi-gram, while Zhu et al. adopt the genetic algorithm (GA) [11] to optimize the combining weights, which is computationally expensive and is sensitive to some artificial parameters. Recently, Wang et al. proposed to integrate the character classification scores and linguistic context by transforming the output of character classifier into posterior probability via confidence transformation [20], which benefits the recognition performance. Furthermore, they investigated into the parameter optimization for path evaluation and efficient path search, and achieved significant improvements on unconstrained handwritten Chinese texts [21]. They reported character-level correct rate of 91.39% on an offline Chinese handwriting database CASIA-HWDB [13]. On another offline Chinese handwriting dataset HIT-MW, they achieved character-level correct rate of 92.72%, which is much higher than previously reported results in [15,22]. For online character string recognition, many works experimented on Japanese handwritten text databases have reported higher accuracies [8 11], which results from the fact that online handwriting recognition has the advantage that the sequences of strokes are available for better segmenting and discriminating characters. For online Chinese character string recognition, however, there have few works reported except that in ICDAR 2011 competition [23], the Vision Object achieved correct rate of 94.33% on a competition dataset. Real-time recognition of handwritten sentences is closely connected with online handwritten character string recognition, which takes similar techniques of path evaluation and search with offline character string recognition. Our system of real-time recognition is customized from a high performance online handwritten character string recognition system by developing robust and efficient techniques for dynamic text line segmentation, character over-segmentation, updating of the candidate lattice, and real-time path search. Among the previous methods for text line segmentation in online handwritten documents, some segment text lines using heuristics or simple features like horizontal projection [24,25] and off-stroke distances [8]. The methods based on optimizing line-fitting objectives [26 28] yield more reliable line partitioning. They usually take a hypothesis-and-test strategy to generate candidate line partitioning and seek for the optimal partitioning by heuristic search. To generate text line hypotheses, however, these methods require that all the strokes have been written. On the other hand, for real-time recognition, line segmentation is performed on each stroke rather than on the whole page. Character over-segmentation in online handwritten character string recognition is often performed using off-stroke (pen lift) distances, and delayed strokes are re-arranged according to some heuristic rules [9]. For over-segmentation in real-time recognition, the rules should be designed more carefully because only part of strokes are available at dynamic segmentation. Recognition speed is another important factor in real-time recognition of handwritten sentences, where character recognition is a crucial part and consumes the majority of computing. With over 5000 classes of frequently used characters, Chinese character recognition is a difficult classification problem. The most popularly used classifiers are the modified quadratic discriminant function (MQDF) [29] and the nearest prototype classifier (NPC) [30]. The MQDF provides higher accuracy than the NPC but suffers from high expenses of storage and computation. In this paper, we will evaluate the performance of both MQDF classifier and NPC, investigating the tradeoff between recognition accuracy and speed. 3. Online handwritten character string recognition We customize a high performance online handwritten character string recognition system to real-time recognition. Before describing the real-time recognition approach, we describe the online handwritten character string recognition approach below. For the character string recognition system, we apply the integrated segmentation recognition strategy, using the same framework as illustrated in Fig. 1. In the system, the input string sample (sequence of strokes) is over-segmented and composed to be sequences of candidate characters, each denoted by X ¼ x 1...x n. Each candidate character is assigned candidate classes (denoted as c i ) by a character classifier, and then the result of character string recognition is a character string C ¼ c 1...c n. In the candidate segmentation recognition lattice, each path (X,C) is evaluated by the path evaluation criterion. In our system, we adopt the path evaluation criterion presented in [21], which is formulated from Bayesian decision view in [21], integrates multiple contexts including character classification, linguistic context, and geometric context, and shows fairly good performance. In this paper, we do not present the derivation process but give the criterion directly for saving space, and more details can be found in [21]. Denote the score of classifying character x into class c given by the character classifier as Pðc9xÞ. The linguistic context is given by a bi-gram language model, which gives the 2-gram probability, denoted as Pðc i 9c i 1 ), from character class c i 1 to c i. The unary class-dependent (uc for short) geometric score, unary classindependent (ui) geometric score, binary class-dependent (bc) geometric score and binary class-independent (bi) geometric score are denoted as Pðc9g uc Þ, Pðz p ¼ 19g ui Þ, Pðc i 1,c i 9g bc Þ, and Pðz g ¼ 19g bi Þ, respectively, where g denotes corresponding geometric feature and output scores are given by geometric models classifying on features extracted. For the ui geometric model, Pðz p ¼ 19g ui Þ indicates the probability of the character being a valid character. For the bi geometric model, Pðz g ¼ 19g bi Þ indicates the probability of the gap between two successive candidate characters being a between-character gap. The path evaluation is the combination of multiple contexts: f ðx,cþ¼ Xn i ¼ 1 fk i log Pðc i 9x i Þþl 1 log Pðc i 9c i 1 Þþl 2 log Pðc i 9g uc Þ þl 3 log Pðz p i ¼ 19g ui i Þþl 4 log Pðc i 1,c i 9g bc i Þþl 5 log Pðz g i ¼ 19g bi i Þg, ð1þ where fl 1,l 2,l 3,l 4,l 5 g are five combining weights that balance the different contributions of different models, and k i is the number of primitive segments composing the candidate character. The idea of weighting character classification score with multiplier k i follows the variable length HMM of [31]. This is to make the sum of classification scores insensitive to the path length (number of candidate characters), and enables optimal path search by DP. i

4 3664 D.-H. Wang et al. / Pattern Recognition 45 (2012) In [21], Wang et al. propose to convert the outputs of models of character classifier and geometric context into posterior probabilities by confidence transformation [12,20]. In this paper, we apply confidence transformation to multiple contexts integration. Specifically, for character classification, we use the Dempster Shafer (D S) theory of evidence [32] to combine the sigmoidal two-class probabilities into multi-class probabilities, which considers the outlier class and hence is suitable for character string recognition [20]. For geometric context models which have a small number of classes, we use the sigmoidal confidence transformation. The confidence parameters are estimated by minimizing the cross entropy (CE) loss function, which is commonly used in logistic regression and neural network training, on a validation dataset (preferably different from the dataset for training classifiers) [12]. In the following, we briefly introduce the character classifier, geometric context modeling, and combining parameters estimation Character classifier Though a large number of classifiers are available in the pattern recognition, only a few of them are effective for the large category set problem of Chinese character recognition [33]. We use the MQDF and NPC because they are among the most popularly used and effective ones, and the main aim of this paper is to propose and demonstrate a real-time handwritten sentence recognition approach instead of comprehensive comparison of classifiers. The MQDF classifier is a modified version of quadratic discriminant function (QDF), which rooted from the Bayesian classifier by assuming that the probability distribution of each class is multivariate Gaussian [29]. In the MQDF, the minor eigenvalues of each class are replaced by a constant, such that only the principal eigenvectors are used in the discriminant function. This helps reduce the computation complexity and meanwhile benefits the generalization performance. For NPC classifiers, we test two variations depending on the prototype learning algorithm: one is trained by the LOG-likelihood of Margin criterion (NPC-LOGM) [34], and one trained by One-Vs- All criterion (NPC-OVA) [35]. The training objective of NPC-LOGM is the negative Conditional Log-likelihood Loss (CLL), where the posterior probability is approximated by the logistic (sigmoidal) function of hypothesis margin. For NPC-OVA classifier, the training objective is the multi-class cross-entropy (CE) loss, where the binary posterior probability is approximated by the sigmoidal function as well. More details of the MQDF classifier, NPC-LOGM, and NPC-OVA can be found in [29,34,35], respectively Geometric context modeling Geometric context has been proven effective in character string recognition [9,10] and transcript mapping of handwritten documents [36]. Similar to geometric modeling in [36], we design four geometric models: unary and binary class-dependent models, unary and binary class-independent models. To build geometric models, we extract features for unary and binary geometry from the bounding boxes of a candidate character pattern, and from two adjacent character patterns, respectively [36]. Due to the large number of Chinese characters and the fact that many different characters have similar geometric features, we cluster the character classes into six super-classes using the EM algorithm. After clustering, we use a 6-class quadratic discriminant function (QDF) for the unary class-dependent model, and a 36-class QDF for the binary class-dependent model. For class-independent geometric models, which in essence is a twoclass classification model, we use a linear support vector machine (SVM) [37] trained with character and non-character samples for the unary class-independent model, and similarly, a linear SVM for the binary class-independent model. In path evaluation, we convert both QDF and SVM outputs to posterior probabilities via sigmoidal confidence transformation Combining parameter estimation The combining weights are learned by Minimum Classification Error (MCE) training [38,39], which has been popularly used in speech recognition and handwriting recognition [40 42]. The objective of learning the combining weights by MCE is to optimize the string recognition accuracy. In string-level MCE training, the weights are estimated on a dataset containing R string samples D x ¼fðX n,c n t Þ9n ¼ 1,...,Rg, where C n t is the ground-truth transcript of the string sample X n. Following Juang et al. [38], the misclassification measure on a string sample is approximated by dðx,lþ¼ gðx,c t,lþþgðx,c r,lþ, ð2þ where L is the parameter set, gðx,c t,lþ is the discriminant function for the truth class, and gðx,c r,lþ is the discriminant function of the closest rival class: gðx,c r,lþ¼max Ck a Ct gðx,c k,lþ. The misclassification measure is transformed to loss by the sigmoidal function: 1 lðx,lþ¼ 1þe, ð3þ xdðx,lþ where x is a parameter to control the hardness of sigmoidal nonlinearity. The parameters in MCE training are learned by stochastic gradient descent [43] on each input sample by Lðt þ1þ¼lðtþ eðtþurlðx,lþ9 L ¼ LðtÞ, where eðtþ is the learning step, and U is related to the inverse of Hessian matrix and is usually approximated to be diagonal. In MCE training for handwritten character string recognition, the discriminant function is the path evaluation criterion as (1), and the rival segmentation recognition path, which is the most confusable one with the correct one, is obtained by beam search. Substituting the discriminant functions f t and f r of the correct and rival path into (4), the parameters are updated iteratively as Lðt ¼ LðtÞ eðtþxlð1 L ¼ ¼ LðtÞ eðtþxlð1 lþðf r f c Þ: 4. Real-time recognition system ð4þ L ¼ LðtÞ The proposed real-time recognition system consists of four main modules (Fig. 2(a)): real-time segmentation recognition module, sentence recognition module, sentence edition module and language association module. While the modules of real-time segmentation recognition and sentence recognition are the core of the automatic recognition system, the other two modules are provided to facilitate user application. The real-time segmentation recognition module (Fig. 2(b)) acts whenever an ongoing stroke is produced. In line segmentation, the system judges which text line the new stroke belongs to. If the stroke belongs to one previous line, then the line is updated and character over-segmentation is performed on the line. If no previous line is found to contain the stroke, the stroke is considered to start a newlineandcomposesthefirstprimitivesegment(astrokeblock) of the line. In character over-segmentation of a text line, if the stroke belongs to one previous segment of the line, the system updates the segment, otherwise creates a new segment using the stroke and ð5þ

5 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 2. (a) Flow chart of the real-time recognition system. (b) Flow chart of the real-time segmentation recognition module. Fig. 3. (a) A candidate lattice and (b) the updated one due to a new stroke. The partial lattice with red lines are added into the previous lattice. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) finds the position of the new segment in the sequence of segments according to the left boundaries. After assigning the new stroke, the updated primitive segment or newly created segment is merged with its preceding segments to generate candidate characters, which are recognized by a character classifier to assign candidate classes. The new candidate characters and their assigned classes, as well as the linguistic context and geometric context scores associated with the new candidate characters, are added into the candidate segmentation recognition lattice. Fig. 3 shows an intermediate candidate lattice and its updated form due to a new stroke. After real-time segmentation recognition on a new stroke, if the pen lift time exceeds a threshold (adjustable by the user, e.g., 0.5 s), the result of sentence recognition is obtained by path search in the updated candidate lattice, performed by the sentence recognition module. The sentence recognition result may have errors of character segmentation or recognition. A sentence edition module is thus designed to correct such errors. Character split error can be corrected by drawing a circle embracing the split parts. Character merge error can be corrected by drawing a vertical line to separate the merged characters. After manual merge or split, the merged or split parts are re-combined into candidate characters and reassigned candidate classes, and the updated candidate lattice are re-searched for sentence recognition result. For character recognition error, candidate classes will be displayed when clicking on the character area, and the user can select the correct class. If the correct class is not in the top ranks, the user can erase the character and rewrite to activate real-time recognition. In the following, we elaborate the techniques in the modules of real-time segmentation recognition and sentence recognition. 5. Real-time segmentation recognition module On a new stroke, the real-time segmentation recognition module performs dynamic text line segmentation, character over-segmentation, and updating of the segmentation recognition candidate lattice. Algorithm 1 illustrates the real-time process of an ongoing stroke, where the Part 1 performs dynamic text line segmentation, and the Part 2 performs dynamic character over-segmentation. Afterwards, candidate characters are generated from the updated primitive segment, and assigned candidate classes to update the candidate lattice.

6 3666 D.-H. Wang et al. / Pattern Recognition 45 (2012) Algorithm 1. Real-time process of an ongoing stroke. Input: Existing lines and segment sequence: lines, segments Line number: m A new stroke: strk Initialization: set lineidx ¼ 1 // Part 1: dynamic line segmentation For i¼m to1 feature¼linestrokefeature(strk,line i ), Classifier(feature), if strk belongs to line i lineidx¼i; break; else continue; End for. // Part 2: dynamic character over-segmentation If (lineidx40) Merge strk into the lineidx-th line, OverSegmentation(updated lineidx-th line) Else Create a new line using strk, Create the first segment of the line using strk. m ¼ mþ1. End if. Update the segment sequence, // Part 3 Generate candidate characters, Update the candidate lattice. End. continues until one line containing the stroke is found or all the previous lines have been considered. If there is no line containing the stroke, the stroke forms a new line. We adopt a statistical classifier to model the geometric relationship of a line stroke pair, and to judge whether the stroke belongs to the line or not. To collect training samples for the two-class classifier, we extract samples from a stroke and its temporally previous lines. If the stroke belongs to the line, the sample is considered to be a positive one, otherwise a negative one. Samples can be extracted from ground-truthed online documents containing multiple text lines. Each positive or negative sample (a line stroke pair) is extracted geometric features for training classifier. For extracting geometric features from a line stroke pair, we do not rely on temporal features such as the off-stroke distance so as to cope with delayed strokes. Before feature extraction, the line line and the stroke strk are tentatively merged and fitted by linear regression. Denote the merged line as line t. The line height is estimated by computing the average height of strokes. We extract 22 features from the line stroke pair, as listed in Table 1. The features can be divided into four categories: (1) five features related to the line line (No. 1 5 in Table 1); (2) two features related to the stroke strk (No. 6 and 7); (3) four scalar features related to the line line t (No. 8 11); (4) 11 scalar features related to the geometric relationship between the stroke strk and the line line as well as the line line t (No in Table 1). The estimated line height of a character string is important in extracting line stroke geometric feature and segment stroke feature. To estimate the line height (denoted as linehei in this paper) robustly, all the strokes in the line are first sorted in ascending order of height, and the half of strokes with larger heights are used to estimate the line height (average of the heights of selected strokes). While writing proceeds, the estimate is updated incorporating the new stroke. The estimate becomes more accurate when the number of strokes increases Dynamic line segmentation This step is to assign a new stroke (denoted as strk) into one of m previous lines (denoted as lines) or start a new line. In the algorithm, lineidx is the index of the text line that the new stroke belongs to, and lineidx ¼ 1 indicates that the stroke starts a new line. The function LineStrokeFeature(strk, line i ) extracts geometric features characterizing the relationship between the stroke and the i-th line. Based on the features, if the classifier judges that strk belongs to line i, then update line i and perform over-segmentation on the updated line i. Otherwise, the process Fig. 4. Segment sequence of multiple lines. Table 1 Line stroke geometric features (the last column denotes whether normalized w.r.t. line height or not). No. Feature Norm 1 2 Height and width of line Y 3 The number of strokes in line N 4 Average regression error of line: s 2 1 Y 5 Horizontal direction of the line line N 6 Height of strk Y 7 Aspect ratio of strk N 8-9 Height and width of line t Y 10 Average regression error of line t : s 2 2 Y 11 Horizontal direction of the line line t N 12 Growth of line height Y 13 Change of horizontal direction N 14 Change of average regression error Y 15 Distance between line and strk, as the minimum distance between strk and the strokes in line Y 16 Common area of line and strk Y Distances of upper/lower bound of strk to vertical center of line along the norm direction of line Y Distances between the upper bounds, lower bounds, upper-lower bounds, and lower-upper bounds of line and strk Y

7 D.-H. Wang et al. / Pattern Recognition 45 (2012) Dynamic character over-segmentation After text line segmentation, dynamic character over-segmentation is then performed to update the sequence of primitive segments. In our system, the segments of multiple lines are ordered in one sequence, as depicted in Fig. 4. In the case that the ongoing stroke forms a new line, the stroke composes the first segment of the line which is considered as the last segment in the sequence. If the stroke is judged to belong to a previous line, dynamic character over-segmentation is performed on the updated line, by the function OverSegmentation(updated line- Idx-th line) in Algorithm 1, as detailed in Algorithm 2. The Algorithm 2 aims to locate the segment the stroke belongs to in the text line L, similar to the line segmentation algorithm as the Part 1 of Algorithm 1. Suppose there are n previous segments in L. In Algorithm 2, segidx denotes the index of the segment that the new stroke belongs to, and segidx ¼ 1 indicates that the stroke starts a new segment. The function SegStrokeFeature(strk, s i ) extracts geometric features characterizing the relationship between the stroke and the i-th segment. Based on the features, the output of a classifier is transformed to a confidence measure which indicates the probability of the stroke belonging to the segment. If the confidence is greater than a threshold g, strk is considered to belongs to s i and is merged into s i. Otherwise, the process continues until one segment containing the stroke is found. If there is no segment containing the stroke, the stroke will be considered to start a new segment. The threshold g should be large enough to avoid merge errors in over-segmentation (g is safely set as 0.85 empirically in our system). After assignment of the stroke, the updated segment sequence of the line is sorted according to the left boundaries, performed by the function SortSegments(s 1 s 2 s n s n þ 1 ). Algorithm 2. Dynamic character over-segmentation. Input: updated line : L segment number: n previous segments: s 1 s 2 s n A new stroke: strk Initialization: set segidx ¼ 1 For i¼n to 1 feature¼segstrokefeature(strk, s i ), confidence¼classifier(feature), if (confidence4g) //strk belongs to s i segidx¼i; Fig. 5. Examples of segment sequence with a new stroke inserted. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) break; else continue; End for. If (segidx40) Merge strk into the segidx-th segment, Else Create a new segment using strk, SortSegments(s 1 s 2 s n s n þ 1 ), n¼nþ1. End if. End. Fig. 5 shows the two main cases of dynamic character oversegmentation on a new stroke, where the segment with red frame indicates a newly created or an updated segment. Case A shows normally writing strokes where the stroke is written in the end of the line, while Case B shows delayed strokes inserted to previous parts. Delayed strokes also happen when a character is deleted in user edition and a new character is re-written in the same position. In Case B, if the stroke starts a new segment, the position of the newly created segment is located according to the left boundaries. For robust over-segmentation, we also adopt a statistical classifier to model the geometric relationship of a segment stroke pair. To collect training samples of positive and negative segment stroke pairs, we first segment the text lines of training data into primitive segments according to the off-stroke distance and then re-arrange delayed strokes using spatial information. Each stroke is paired with its temporally preceding segments to form segment pair samples, which are positive or negative samples depending on the stroke really belongs to the segment or not. Similar to feature extraction for line segmentation, we do not rely on temporal features so as to cope with delayed strokes. The geometric features of a segment stroke pair include 12 features in total: three features related to the stroke strk (No. 1 3 in Table 2), two features related to the segment s i (No. 4 and 5), two features related to the temporally merged segment s i t (No. 6 and 7), and five scalar features related to the relationship between strk and s i (No. 8 12). The horizontal overlap between strk and s i, which is important for character over-segmentation, is characterized by horizontal relationship between them as the features No Candidate characters generation On dynamic line segmentation and character over-segmentation, generation of candidate characters is straightforward. Fig. 6 shows examples of new candidate characters. The segment with red box indicates the one updated or formed by a new stroke, and the blue frame embraces the candidate characters that start from or end at the red segment. In this paper, the maximum number of segments composing a candidate character is denoted as SN. Table 2 Segment stroke geometric features (the last column denotes feature value normalization w.r.t. the line height). No. Feature Norm 1 2 Height and width of strk Y 3 Aspect ratio of strk N 4 5 Height and width of s i Y 6 7 t Height and width of the temporally merged segment s i Y 8 Common area of bounding boxes of strk and s i Y 9 Horizontal gap between bounds of strk and s i Y Distances between the left bounds, right bounds, horizontal centers of strk and s i Y

8 3668 D.-H. Wang et al. / Pattern Recognition 45 (2012) Fig. 6. Examples of new candidate characters. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) The generation of candidate characters is subject to some heuristic rules for reducing the number of candidate characters while guaranteeing including true characters: (a) the number of segments in a character does not exceed the maximum number SN; (b) segments in different lines are not combined to candidate character; (c) candidate characters with width larger than a threshold (safely set as 3 linehei in our system) are pruned; (d) two successive segments with horizontal distance larger than a threshold (safely set as 2 linehei in our case) are not allowed to be merged Candidate lattice updating After over-segmentation and candidate characters generation, the candidate character classes and their scores, the linguistic context and geometric context involving the newly generated candidate characters are obtained using the character classifier, language model and geometric model, respectively, and are added into and updated in the candidate lattice. To roughly estimate the computation cost on a new stroke, we consider the costs of feature extraction and classification in character recognition and geometric context scoring, as well as getting the linguistic scores. Denote the number of candidate classes for each candidate character maintained in the candidate lattice as CN and the number of newly generated candidate character patterns as PN (Pattern Number). In normal writing order as in the Case A of Fig. 6, the maximum number of candidate characters is PN¼SN (some candidate characters may violate the conditions (b) (d) and will be pruned). When a delayed stroke is written as in the Case B of Fig. 6, where the delayed stroke is in the pos-th segment, the number of candidate characters composed of k segments containing the pos-th segment is k (start segment from ðpos kþ1þ-th to pos-th). Consider candidate characters composed of 1; 2,...,SN segments, the maximum number of candidate characters associated with the pos-th segment is PN ¼ 1þ2þ þsn ¼ SN ðsn þ1þ=2. The cost of character classification (including character feature extraction and classification) is proportional to the number of candidate character patterns PN. For linguistic context given by the character bi-gram, there is no feature extraction but retrieving the value for pairs of successive characters from the lexicon. For each candidate class of a candidate character, there are at most SN CN preceding classes and SN CN succeeding classes in the candidate lattice. Remember that the maximum number of candidate characters associating the current segment is PN and the maximum number of candidate classes is PN CN, the cost of retrieving language model is proportional to f2 PN CN ðsn CNÞg (here PN ¼ SNðSN þ1þ=2) for Case B and fpn CN ðsn CNÞg (here PN¼SN) for Case A (which has predecessors only). Updating the binary class-dependent geometric context is similar to that for linguistic context except that retrieving bi-gram is replaced by geometric feature classification. The binary class-independent geometric context and the unary geometric contexts cost less computation than the binary class-dependent geometric context because they have less geometric classes. Since the candidate character classification and geometric context scoring cost majority of computation, it is beneficial to make them computed in real time during the writing process. After the writing of a sentence is finished, the sentence recognition only has to search the candidate lattice. 6. Sentence recognition (path search) Sentence recognition is to search the candidate lattice for the optimal segmentation recognition path. Due to the summation nature of the path evaluation criterion of Eq. (1), the dynamic programming (DP) algorithm can be adopted for optimal path search. We further apply the beam search strategy to accelerate DP search by pruning the partial paths at intermediate nodes. The search algorithm is suitable for real-time recognition because the retained multiple partial paths can be extended for further path search when ongoing strokes are continually produced. The adopted beam search algorithm is similar to that used in [21], but we implement it in a different way for efficient updating of candidate lattice in real-time recognition. The DP search algorithm is similar to the forward procedure in the Viterbi decoding algorithm [44]. After character over-segmentation, sentences of multiple lines are represented as a sequence of primitive segments ox 1 x 2,...,x T 4 where T is the total number of segments in the sequence. A candidate pattern consisting of s segments and ending at t-th segment is denoted as x t s þ 1,t (1rsrSN). If we assign the c-th candidate class (1rcrCN) to the candidate pattern, we get one single path from the (t sþ1)-th segment to the t-th segment, denoted as (t,s,c). Denote candidate paths ending at t-th segment as P t, in which one single path is denoted as p t. Then the forward variable can be defined as f t,s,c ¼ max p t s: p t s A Pt s f ðp t s,ðt,s,cþþ, i.e., f t,s,c is the best score (highest probability) along a single path ending at the (t s)-th segment, extended with a candidate character ending at the t-th segment and associated with class c. The beam search strategy accelerates DP by pruning candidate paths: among the candidate paths ending at the (t s)-th segment, we retain the BW (Band Width) top ranked paths and prune the others. Then we can search the optimal path ending at t-th segment inductively as follows: Algorithm 3. Beam Search in frame-synchronous fashion. (1) Initialization ( f 1,s,c ¼ f 1,s,c, s ¼ 1; 1rcrCN 0, sz2; 1rc rcn ð6þ

9 D.-H. Wang et al. / Pattern Recognition 45 (2012) If CN 4BW, retain BW top ranked paths ending at the first segment; Otherwise, retain the CN paths. (2) Induction f t,s,c ¼ max ff top BWðs 0,c 0 t s,s Þ 0,c þk logðc9x 0 t s þ 1,tÞþl 1 log Pðc9c 0 Þ þl 2 log Pðc9g uc Þþl 3 log Pðz p ¼ 19g ui Þ þl 4 log Pðc 0,c9g bc Þþl 5 log Pðz g ¼ 19g bi Þg: Retain BW top ranked paths ending at the t-th segments. (3) Termination f T ¼ max ðs,cþ f T,s,c: (4) Backtracking. Step (1) initializes the candidate paths containing the first segment as the candidate character pattern, using the character classification scores, linguistic context score, unary class-dependent and unary class-independent geometric context scores. The induction step, which is the heart of the algorithm, is to search the optimal path for each triplet ðt,s,cþ based on previous optimal partial paths ending at the (t s)-th segment, using multiple contexts that have been computed when updating the candidate lattice during writing. In the termination step, the optimal complete path is chosen from paths ending at the last segment, and the character segmentation and recognition results are obtained in the backtracking step. In the induction step, the maximum number of candidate paths ending at the (t s)-th segment is SN CN, among which the optimal one is chosen as the preceding path of ðt,s,cþ. When BW equals to SN CN, the search process is the same as the DP algorithm. When BW osn CN, the search process is accelerated. From the algorithm, we can see that the path search is framesynchronous (also called as time-synchronous in speech recognition, partial paths are updated segment by segment), and the DP algorithm guarantees finding the optimal path for context models up to order 2. Now that the beam search algorithm updates the optimal partial paths ending at a segment from the retained partial paths ending at the previous segments, it enables path extension when the candidate lattice is updated on new strokes. Suppose in realtime recognition, the position of the updated or newly created segment on a new stroke is pos, the system performs beam search from the candidate paths ending at the preceding segments of the pos-th segment, and extend to the succeeding segments if the new stroke is a delayed stroke. 7. Experiments We evaluated the performance of the proposed real-time recognition approach on a database of online Chinese handwriting: CASIA-OLHWDB [13]. This database is divided into six datasets, three for isolated characters (DB ) and three for handwritten texts (DB ). There are 3,912,017 isolated character samples and 52,221 handwritten pages (consisting of 1,348,904 character samples) in total. Both the isolated data and handwritten text data have been divided into standard training and test subsets. Though the handwritten text data has been produced ahead of time, we can utilize the temporal stroke order to simulate the real-time writing process for evaluating real-time recognition performance. In sentence-based input, due to the limited space of writing area of mobile computers, users tend to write multiple text lines and each line contains only a few (mostlyo10) characters. To simulate this situation, we used the datasets DB (called Fig. 7. (a) A handwritten text page; (b) three short pages generated from the first three lines. Table 3 Statistics of DB2 and generated short page dataset GDB2. Datasets #Page #Line #Line/page #Characters #Chars/line DB2 Train , ,082, Test , , GDB2 Train 41, , ,082, Test 10,510 38, , DB2 for short) to generate short text pages each with three to six text lines, each line consisting six to eight characters. In DB2, a text line typically contains characters because it was written on A4 paper using digital pen. We split each line into multiple lines as in a short page by making the width of each line not larger than five times of the average height of the original lines. Fig. 7 shows an example of data generation: (a) is a handwritten text page, and (b) shows three short pages derived from the first three lines in (a). Table 3 provides the details of dataset DB2 and the generated short page dataset (called GDB2 for short). The total number of strokes in the test set is 987,027. To evaluate the real-time recognition performance on short pages with delayed strokes, we produced delayed strokes in GDB2 by changing the writing order of a stroke in each page. Specifically, we randomly chose a stroke and place it randomly after its original position. The generated short page dataset with delayed strokes is called GDB2-D for short Experimental setup For dynamic line segmentation and character over-segmentation, we used a linear SVM classifier to model the geometric relationship of line stroke pair and segment stroke pair, respectively, and trained the classifier on features extracted from the training set of text lines of GDB2-D. We evaluated the recognition performance using the three character classifiers introduced in Section 3.1: MQDF, NPC-LOGM, and NPC-OVA. The classifier parameters were learned on 4/5 of training character samples (both the isolated characters in the training set of DB1 and the segmented characters in the training set of DB2, 4,207,801 samples in total), and the remaining 1/5 training samples were used for confidence parameter estimation. The training character samples fall in 7356 classes, including

10 3670 D.-H. Wang et al. / Pattern Recognition 45 (2012) Chinese characters and 171 alphanumeric characters and symbols. For character feature extraction, we use the local stroke direction histogram feature, which has been popularly used in both online and offline handwritten character recognition. Particularly, we adopt the implementation method of [45] for direction feature extraction using bi-moment normalization. After direction decomposition, 8 8 feature values are extracted from each of eight direction planes. To reduce the complexity of the classifier, the 512D feature vector is projected onto a 160D subspace learned by Fisher linear discriminant analysis (FLDA). The character bi-gram language model was trained on a text corpus containing about 50 million characters (about 32 million words) [16]. To estimate the parameters of the geometric models and train the combining weights of path evaluation criterion, we simulated the real-time character over-segmentation process on the training text lines of GDB2-D. In simulation, a text line is over-segmented into primitive segments using the dynamic over-segmentation algorithm. On the segment sequence, we extracted samples of geometric features for geometric context modeling. Using the character classifier, language model and geometric context models, we then constructed the candidate lattice on the segment sequence and trained the combining weights by MCE. Table 4 shows some statistics of character samples segmented from the test pages of DB2. The rec row gives the correct rate of segmented character recognition by character classifiers, and rec10 and rec20 are the cumulative accuracies of top 10 and 20 ranks, respectively. We can see that for all three classifiers, the correct rate of Chinese characters is the highest among four character types, and the MQDF classifier is the highest for Chinese characters among three classifiers. Comparing the overall correct rate, however, the NPC-LOGM classifier is the highest because it performs much better on symbols. The non-characters are abnormal samples and labeled as non-characters in the database, and outliers are the characters out of the defined 7356 classes. Our experiments were implemented on a PC with Intel(R) Core(TM) 2 Duo CPU E GHz processor and 2 GB RAM, and were programmed using Microsoft Visual Cþþ Performance metrics We use some performance metrics for dynamic line segmentation and real-time sentence recognition, respectively. Many metrics have been defined for evaluating performance of line segmentation [10,46,47]. We adopt some of them and define a new metric for performance of real-time line segmentation. These metrics are based on the definitions of matches. A one-toone match is a match where a detected line and a ground-truthed line contain identical strokes. And g-one-to-many match occurs when the union of two or more detected lines equal to a groundtruthed line. Similarly, a d-many-to-one match means the union of two or more ground-truthed lines equals a detected line. Among the performance metrics presented in [28], we chose the detection rate (DR), recognition accuracy (RA) and entity detection metric (EDM): DR ¼ w 1 one2one N g_one2many þw 2, N one2one RA ¼ w 3 M þw d_many2one 4, M 2 DR RA EDM ¼ DRþRA, where N is the number of ground-truthed lines, M is the number of detected lines, and w 1 w 4 are all set to 1. DR, RA and EDM are similar to the recall, precision and F-rate, respectively. Page recognition rate (PRR), defined as the percentage of pages with no segmentation error, is used to measure the page level performance. To evaluate the performance of real-time sentence recognition, we use two character-level metrics [15,21]: Correct Rate (CR) and Accurate Rate (AR): CR ¼ðN t D e S e Þ=N t, AR ¼ðN t D e S e I e Þ=N t, where N t is the total number of characters in the ground-truth transcript. The numbers of substitution errors (S e ), deletion errors (D e ) and insertion errors (I e ) are calculated by aligning the recognition result string with the transcript by dynamic programming. The metric CR denotes the percentage of characters that are correctly recognized. Further, the metric AR considers the number of characters that are inserted due to over-segmentation. For real-time recognition of handwritten sentences, besides the CR and AR, the recognition speed is of crucial importance for practical applications. We evaluated the speed using the CPU times of each separate step as well as the whole process: line segmentation (denoted as C l ), character over-segmentation (denoted as C o ), candidate lattice updating (computing multiple contexts in the candidate lattice, denoted as C u ), path search (denoted as C s ), and the whole recognition process (denoted as C w ). In C u, we also count the times of character classification, geometric context scoring, and linguistic context scoring, respectively. The time cost is averaged over the total number of strokes, since in real-time recognition, each step performs once on each stroke Performance of dynamic line segmentation We evaluated the performance of real-time line segmentation on simulated short text pages. Given an online page, whenever a stroke is input, the system performs line segmentation and Table 4 Statistics of character types and recognition rates. Classifier All Chinese Symbol Digit Letter Non-char Outlier Number 269, ,078 26,753 6, MQDF Rec(%) Rec Rec NPC-LOGM Rec(%) Rec Rec NPC-OVA Rec(%) Rec Rec

11 D.-H. Wang et al. / Pattern Recognition 45 (2012) Table 5 Performance of dynamic line segmentation on GDB2 and GDB2-D. Dataset DR RA EDM PRR GDB GDB-D Table 6 Recognition accuracies with dynamic line segmentation. Table 8 CPU times (ms) in updating the candidate lattice. Dataset Classifier C u : Char Geo Lng GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Dataset Classifier CR (%) AR (%) ch (%) sb (%) dg (%) lt (%) GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Table 7 CPU times (ms) of sentence recognition with dynamic line segmentation. Table 9 Recognition accuracies with ground-truth line segmentation. Dataset Classifier CR (%) AR (%) ch (%) sb (%) dg (%) lt (%) GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA Dataset Classifier C l C o C u C s C w GDB2 MQDF NPC-LOGM NPC-OVA GDB2-D MQDF NPC-LOGM NPC-OVA updates text lines. After the last stroke is processed, the result of line segmentation is obtained. In our previous work [48], we used a linear SVM classifier to characterize the geometric relationship of a line stroke pair, and evaluated the performance on GDB2.1-D (a subset of GDB2-D). The comparison with other related methods (including the ones based on off-stroke distance and overlap of the line stroke pair) has shown the robustness and effectiveness of the proposed algorithm. Hence, in this paper, we only show the performance of the proposed algorithm on GDB and GDB2-D in Table 5, without comparing with the other methods. From the results, we can see that the algorithm performs well and can deal with delayed strokes Performance of real-time sentence recognition We evaluated the performance of real-time recognition on the generated short page datasets without (GDB2) and with delayed strokes (GDB2-D). On dynamic text line segmentation and candidate lattice updating on each stroke, the sentence recognition result is found using the real-time beam search algorithm with default parameter setting SN¼6, CN¼10 and BW¼10 (these parameter values were found to give good tradeoff between recognition accuracy and speed) Performance with dynamic line segmentation In this case, the sentence recognition result is obtained based on dynamic text line segmentation, so the line segmentation error will cause sentence recognition error. The recognition accuracies (CR and AR) on the test data of GDB2 and GBD2-D are listed in Table 6, and the CPU times are given in Table 7. InTable 6, theperformanceis also specified to different types of characters: Chinese characters (ch), symbols (sb), digits (dg) and letters (lt). In Table 7,thetimecost is specified to different steps: line segmentation (C l ), character oversegmentation (C o ), candidate lattice updating (C u ), path search (C s ), and whole process (C w ). Further, the CPU time in candidate lattice updating is specified to character classification (Char), geometric context scoring (Geo), and linguistic context scoring (Lng), and is given in Table 8. From the results, we have some observations as follows: (a) The character correct rate of sentence-based input is significantly higher than that of isolated character recognition (in Table 4) though sentence-based recognition involves segmentation. This is due to the important effect of contexts. (b) For all the three character classifiers, the correct rates on Chinese characters are fairly high (94.09%, 91.22%, and 89.46% for MQDF, NPC-LOGM, and NPC-OVA, respectively), but the correct rates of symbols, digits and letters are much lower. This is because the shapes of symbols, digits and letters are more likely to be confused. (c) The correct rate on handwritten text data with delayed strokes is only slightly lower than that on data without delayed strokes. This demonstrates the robustness of the proposed approach against delayed strokes. (d) Among the three character classifiers, the MQDF classifier gives the highest overall correct rate and accurate rate. This is due to its advantage on Chinese characters, which takes advantage of the linguistic context to improve the recognition accuracy. (e) Tables 7 and 8 show that the major source of computation cost lies in the updating of candidate lattice, and further, character classification costs most of computation time in candidate lattice updating. Comparing the three classifiers, the MQDF classifier is most computationally intensive, and it makes the whole recognition process much slower than the NPC classifiers. (f) The proportion of time cost of path search (C s ) in the whole recognition process is very small. This favors real-time applications, because the majority of computation is performed during the writing process, and the sentence recognition results can be obtained immediately by path search after writing is finished. (g) Although the MQDF classifier is computationally intensive, it is acceptable for real-time applications because the majority of computation is done during writing. From the tradeoff

12 3672 D.-H. Wang et al. / Pattern Recognition 45 (2012) between accuracy and speech, the NPC-LOGM classifier is preferable Performance with ground-truth line segmentation In this case, the ground truth of text line segmentation is taken such that sentence recognition is not disturbed by line segmentation error. This is to evaluate the performance of pure sentence recognition as if the sentence is always written in a single line. The recognition accuracies are shown in Table 9. Compared to Table 6 we can see that the recognition accuracies do not differ significantly between ground-truth line segmentation and dynamic line segmentation. This is because the dynamic line segmentation algorithm yields very few errors Effects of parameters in path search Both the recognition accuracy and speed depend on the parameters SN (segment number) and CN (candidate class number) in candidate lattice updating, and BW (band width) in beam search. In the above experiments, the parameters were set default values (SN¼6, CN¼10, BW¼10). The choice of SN is related to the concrete candidate character generation method. In our system, SN¼6 was chosen to guarantee that nearly all true characters can be generated by combining consecutive segments. Under this situation, we evaluate the effects of the other two parameters CN and BW. Experimental results with CN¼10 and various BW are shown in Fig. 8. When BW equals SN CN (60 in this case), the performance is the same as that of the DP algorithm. We do not show Fig. 8. Experimental results with various BW. (a) Correct rate of MQDF and NPC-LOGM classifier. (b) Recognition speed of MQDF and NPC-LOGM classifier. Fig. 9. Experimental results with various CN. (a) Correct rate of MQDF and NPC-LOGM classifier. (b) Recognition speed of MQDF and NPC-LOGM classifier. Fig. 10. Examples of line segmentation error. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

13 D.-H. Wang et al. / Pattern Recognition 45 (2012) the results of NPC-OVA classifier since the correct rate of which is lower than NPC-LOGM classifier while their recognition speed is the same. From the results, we can see that, when BW equals 10, the recognition accuracy is as high as when BW equals 60, and various BW makes little difference in recognition speed. Performances with BW¼10 and various CN are shown in Fig. 9. We can see that the number of 10 candidate classes performs sufficiently well, and increasing CN improves correct rate slightly but brings substantial computation cost Examples of recognition errors The sources of real-time sentence recognition errors include the text line segmentation error, character over-segmentation failure (under-segmentation), character classification error, and path search failure. The error rate of dynamic text line segmentation is very low as shown in Table 5. Some examples of line segmentation error are shown in Fig. 10, where the stroke in red is the succeeding stroke written after the first line. This indicates that line segmentation error likes to happen in the beginning of writing because the line height is not precisely estimated due to the small number of strokes, as well as the red stroke is rather apart from the first line. But this line segmentation error does not affect sentence recognition significantly because the segment sequence is correctly ordered. As the writing continues, the temporary line segmentation error may be corrected after the line becomes longer. Over-segmentation failure happens when a segment contains strokes that belong to different characters or connected strokes are written. Character classification error means that the true class of the candidate character is not included in the top CN candidate classes. This makes the correct path not included in the candidate lattice. The path search failure occurs when the correct path, even though included in the candidate lattice, cannot be searched by the path search algorithm due to the imperfection of path evaluation criterion or search algorithm. We show three examples of recognition errors in Fig. 11: (a) shows a symbol that is misrecognized, (b) shows a Chinese character recognition error, and (c) shows a segmentation error. In real-time string recognition, the correct rates of symbols, letters and digits are quite low, as have been shown in Table 6. Itisthegoaltoimprove the accuracy on alphanumerics and symbols in the future Example of real-time recognition process We have developed a prototype system on Tablet PC to demonstrate the applicability of the proposed real-time recognition approach. User edition functions have also been developed for correcting segmentation and recognition errors, though they are not described in this paper. Fig. 12 shows some sampled steps of real-time recognition on a handwritten sentence, where the left subwindow is the writing area and the right sub-window shows the recognition result. Whenever the pen lift time exceeds a threshold, the sentence recognition result will be shown and the user has the Fig. 11. Three examples of recognition errors. (a) A symbol is misrecognized; (b) A Chinese character is misrecognized; (c) A segmentation error. Upper, segments after over-segmentation; middle, segmentation recognition result; bottom, ground-truth. Fig. 12. Steps of real-time recognition on a short page.

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting 2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin

More information

A semi-incremental recognition method for on-line handwritten Japanese text

A semi-incremental recognition method for on-line handwritten Japanese text 2013 12th International Conference on Document Analysis and Recognition A semi-incremental recognition method for on-line handwritten Japanese text Cuong Tuan Nguyen, Bilan Zhu and Masaki Nakagawa Department

More information

Grouping Text Lines in Online Handwritten Japanese Documents by Combining Temporal and Spatial Information

Grouping Text Lines in Online Handwritten Japanese Documents by Combining Temporal and Spatial Information Grouping Text Lines in Online Handwritten Japanese Documents by Combining Temporal and Spatial Information Xiang-Dong Zhou, Da-Han Wang, Cheng-Lin Liu ational Laboratory of Pattern Recognition, Institute

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

WITH the increasing use of digital image capturing

WITH the increasing use of digital image capturing 800 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 3, MARCH 2011 A Hybrid Approach to Detect and Localize Texts in Natural Scene Images Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu, Senior Member, IEEE

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition 2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien

More information

A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms

A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms 2012 International Conference on Frontiers in Handwriting Recognition A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms Liang Xu, Fei Yin, Qiu-Feng Wang, Cheng-Lin

More information

CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters

CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters 2009 10th International Conference on Document Analysis and Recognition CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters Da-Han Wang, Cheng-Lin Liu, Jin-Lun Yu, Xiang-Dong Zhou National

More information

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation 009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Effect of Text/Non-text Classification for Ink Search employing String Recognition

Effect of Text/Non-text Classification for Ink Search employing String Recognition 2012 10th IAPR International Workshop on Document Analysis Systems Effect of Text/Non-text Classification for Ink Search employing String Recognition Tomohisa Matsushita, Cheng Cheng, Yujiro Murata, Bilan

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

THE problem of Chinese/Japanese handwriting recognition

THE problem of Chinese/Japanese handwriting recognition IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 10, OCTOBER 2013 2413 Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields Xiang-Dong Zhou,

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

A Combined Method for On-Line Signature Verification

A Combined Method for On-Line Signature Verification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 14, No 2 Sofia 2014 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2014-0022 A Combined Method for On-Line

More information

Pattern Recognition ( , RIT) Exercise 1 Solution

Pattern Recognition ( , RIT) Exercise 1 Solution Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

On Adaptive Confidences for Critic-Driven Classifier Combining

On Adaptive Confidences for Critic-Driven Classifier Combining On Adaptive Confidences for Critic-Driven Classifier Combining Matti Aksela and Jorma Laaksonen Neural Networks Research Centre Laboratory of Computer and Information Science P.O.Box 5400, Fin-02015 HUT,

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Radial Basis Function Neural Network Classifier

Radial Basis Function Neural Network Classifier Recognition of Unconstrained Handwritten Numerals by a Radial Basis Function Neural Network Classifier Hwang, Young-Sup and Bang, Sung-Yang Department of Computer Science & Engineering Pohang University

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

Recognition of online captured, handwritten Tamil words on Android

Recognition of online captured, handwritten Tamil words on Android Recognition of online captured, handwritten Tamil words on Android A G Ramakrishnan and Bhargava Urala K Medical Intelligence and Language Engineering (MILE) Laboratory, Dept. of Electrical Engineering,

More information

Handwriting Character Recognition as a Service:A New Handwriting Recognition System Based on Cloud Computing

Handwriting Character Recognition as a Service:A New Handwriting Recognition System Based on Cloud Computing 2011 International Conference on Document Analysis and Recognition Handwriting Character Recognition as a Service:A New Handwriting Recognition Based on Cloud Computing Yan Gao, Lanwen Jin +, Cong He,

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition Dinesh Mandalapu, Sridhar Murali Krishna HP Laboratories India HPL-2007-109 July

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Robust line segmentation for handwritten documents

Robust line segmentation for handwritten documents Robust line segmentation for handwritten documents Kamal Kuzhinjedathu, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

Statistical Methods and Optimization in Data Mining

Statistical Methods and Optimization in Data Mining Statistical Methods and Optimization in Data Mining Eloísa Macedo 1, Adelaide Freitas 2 1 University of Aveiro, Aveiro, Portugal; macedo@ua.pt 2 University of Aveiro, Aveiro, Portugal; adelaide@ua.pt The

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Fabric Defect Detection Based on Computer Vision

Fabric Defect Detection Based on Computer Vision Fabric Defect Detection Based on Computer Vision Jing Sun and Zhiyu Zhou College of Information and Electronics, Zhejiang Sci-Tech University, Hangzhou, China {jings531,zhouzhiyu1993}@163.com Abstract.

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram Author manuscript, published in "International Conference on Computer Analysis of Images and Patterns - CAIP'2009 5702 (2009) 205-212" DOI : 10.1007/978-3-642-03767-2 Recognition-based Segmentation of

More information

Prototype Selection for Handwritten Connected Digits Classification

Prototype Selection for Handwritten Connected Digits Classification 2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines 2011 International Conference on Document Analysis and Recognition Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines Toru Wakahara Kohei Kita

More information

Information Fusion Dr. B. K. Panigrahi

Information Fusion Dr. B. K. Panigrahi Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Optimized Watermarking Using Swarm-Based Bacterial Foraging

Optimized Watermarking Using Swarm-Based Bacterial Foraging Journal of Information Hiding and Multimedia Signal Processing c 2009 ISSN 2073-4212 Ubiquitous International Volume 1, Number 1, January 2010 Optimized Watermarking Using Swarm-Based Bacterial Foraging

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Narumol Chumuang 1 and Mahasak Ketcham 2 Department of Information Technology, Faculty of Information Technology, King Mongkut's

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

Slant Correction using Histograms

Slant Correction using Histograms Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,

More information

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 45 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 45 (2015 ) 205 214 International Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) Automatic

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information