The Impact of Ruling Lines on Writer Identification
|
|
- Emory Chambers
- 5 years ago
- Views:
Transcription
1 The Impact of Ruling Lines on Writer Identification Jin Chen Lehigh University Bethlehem, PA 18015, USA Daniel Lopresti Lehigh University Bethlehem, PA 18015, USA Ergina Kavallieratou University of Aegean Samos, Greece Abstract Paper often includes pre-printed ruling lines to help people write more neatly. This particular example of realworld noise can have a serious impact on applications such as handwriting recognition and writer identification, however. In this work, we investigate the effects of ruling lines on writer ID. We study a method for detecting and removing ruling lines and test its utility for Arabic writer identification through a series of experiments. Our preliminary results show that under realistic assumptions where ruling lines are expected to have different properties across the collection, e.g., thickness, spacing, etc., removing them significantly improves identification performance. We conclude with a discussion of work-in-progress to examine followup questions raised by our initial investigations. Keywords-Writer Identification; Ruling-line Artifacts; I. INTRODUCTION Writer identification is the task where, given a query and asetofknownwriters,thesystemattemptstooutputthe identity of the handwriting. In general, the output is a list of potential authors with associated confidence scores in a descending order [1], [2]. Sometimes, a rejection option is available as well [2]. If any text content is used for identity establishment, this task of identification is usually called text-dependent, otherwiseitis text-independent. Since the survey by Plamondon and Lorette that summarized the state of art in 1989 [3], there have been significant improvements in the field. Researchers have investigated the problem in English [1], Chinese [4], Arabic [5], and many other languages [6], [7]. In terms of recognition techniques, K- nearest Neighbors [8], Neural Networks [9], Hidden Markov Models [2], and Support Vector Machines [10] have been found to be useful in discriminating writers. Unlike the case in handwriting recognition, writer identification strives to preserve as much as possible the characteristics of inter-writer variation. Schlapbach and Bunke investigated three normalization steps in off-line English writer identification [11]: character width normalization, vertical scaling, and slant correction. From experimentation, the authors observed that slant correction is likely to hamper performance. Although extensive research has been done in writer identification, previous work has assumed that the input is provided on clean, unlined paper. This may be a reasonable simplifying assumption to start with, but it is not likely to hold in practice. Provided as helpful guides on writing pads and invoice forms, ruling lines often overlap with the handwriting of interest. It is, therefore, understood that ruling lines must be removed before attempting handwriting recognition, and past work has attempted to address this need in that field. However, the impact of ruling lines on writer identification has not yet received similar attention. In this paper, we make the first steps in studying the impact of ruling-lines on the task of writer identification. Using a database of Arabic offline handwritten documents collected by the Linguistic Data Consortium (LDC) [12], we first split it into groups that contain ruling-line-only, rulingline-free, and mixed handwritten text lines. Then we run SVM classifiers on different settings of training and testing datasets. Our experimental results shows that under realistic assumptions where ruling lines are expected to have different properties across the collection, e.g., thickness, spacing, etc., removing them significantly improves identification performance. On the other hand, in an artificial setting where ruling-lines are always present and have the same properties across all of the samples, removing them hampers identification performance, as might be expected. The remainder of the paper is organized as follows. We first discuss related work on ruling line removal for handwriting recognition in Section II. In Section III we explain the method we have chosen to use in our studies. Next, we describe our experimental setup in Section IV. We present some preliminary experimental results in Section V, and conclude with a discussion of ongoing work in Section VI. II. RELATED WORK There has been much work in the field of forms processing and handwriting recognition that deals with ruling lines. For example, in Cao and Govindaraju s work [13], the patchbased MRF shape modeling was trained on pre-defined shape patterns to recover the deformations of image shapes. The authors reported significant improvements over traditional approaches using a database of handwritten carbon forms. However, the method is computationally expensive to scale up. Abd-Almageed, et al., introducedarulinglineremoval algorithm based on modeling in linear subspaces [14]. In
2 attractive properties, such as the ability to handle light and largely incomplete lines. However, the line-by-line approach is a more general one in that it can operate on various of handwritten documents: not only writing pads, but forms, tables, etc. In our experiments, we adopt the line-by-line approach in which we first decompose the page into preidentified text lines. It should be clear, however, how the method generalizes to full-page images. Figure 1 shows an example of overlapping ruling lines and handwriting from our test collection. Figure 1: Arabic handwriting on a page with ruling lines. the training phase, they used ruling-line-only pages, while in the testing stage, they first projected feature vectors into the subspaces and then computed the reconstruction error. They implemented a synthetic evaluation scheme and obtained approximately 88% for both recall and precision. Arvind, et al., proposedarule-basedmethodthatfirstdetected the ruling lines within segmented handwritten blocks by computing the horizontal projection profile [15]. Based on minimizing the profile entropy, the authors computed the skew angle and detected the positions of ruling lines by investigating the peak position in the horizontal projection file. They performed run-length analysis to determine which pixels belonged to the ruing lines. After removing the lines, they designed an algorithm to correct broken strokes. They observed an accuracy of 86.33%. As an improvement, Cao, et al., modifiedarvind salgorithm to recover deformations of the shape of the handwriting [16]. They also introduced a simple technique to detect false connected sections after removing ruling lines. They achieved a word error rate (WER) reduction of 15.5% using 57 pages. In addition, their algorithms achieved a mismatch ratio of 1.37% on synthesized clean handwriting and rulingline-only images. Although an ultimate goal of our work is to develop ruling line removal algorithms appropriate for writer identification applications, this is not the subject of our current paper. Thus, we adapt the algorithms from [15] and [16] with a few necessary changes. In the following section, we briefly describe the particular approach we have implemented. III. RULING LINE REMOVAL In our context, the ruling lines on a page can be represented by a model-based approach which attempts to capture all of the lines with a compact set of parameters, or a line-by-line approach which considers each ruling line in isolation. Model-based ruling line removal has several A. Ruling Line Detection After applying a generic median filter to remove scanning noise, we detect the positions of ruling lines and the skew simultaneously. The idea is from Arvind, et al. s work [15], where the underlying assumption is that ruling lines usually dominate the horizontal projection profiles (HPPs). We first estimate a global skew interval of handwritten lines (±12 ), then we rotate the line images by each skew angle (1 at a time) and compute the HPPs. Within each profile, the entropy is computed by the following equation: E(i) = i HPP(i) log(hpp(i)) (1) where row index i ranges from 0 to the height of the rotated image and HPP(i) means the pixel count in each row. Now the problem of finding ruling lines becomes an entropy minimization problem. The result of this step is illustrated in Figure 2b. Next, we estimate the line thickness by investigating the histogram of vertical run-lengths [17]. We select the peak value as the estimation. Later, we analyze all horizontal runlengths around the position and run a least square line fitting to acquire an optimal ruling line (one pixel wide). Finally, all horizontal run-lengths around this central ruling line are considered to be the line. Figure 2c shows the detected ruling line in a sample. B. Stroke Deformation Recovery Once the ruling line pixels have been identified, we remove them by assigning white pixels accordingly. Then the next problem is to recover broken handwritten strokes. Following the strategy from [16], there are three sub-steps: broken stroke reconnection, thinned stroke recovery, and Ushape pattern detection and stroke regeneration. We briefly describe each of these steps for completeness. Here the term sections mean the stroke segments that are caused by removing ruling lines. Broken strokes are recognized by computing the distances between sections above the ruling line and those below them. If the distance of two sections is within a pre-defined threshold and their lengths are comparable to the stroke thickness, we consider them broken stroke segments and reconnect them by drawing a trapezoid. Otherwise, we draw
3 (a) Original line image. (b) The horizontal projection after median filtering and skew correction. (c) Detected ruling line. In the zoom-in segment, red pixels mark the central position of the ruling line and blue ones are run-lengths associate with the ruling lines. (d) Finally, the result of ruling line removal. Figure 2: Illustration of ruling line detection, removal, and deformation recovery. aparallelogramconnectingtheshortersectiontothenearest end of the longer one. Thinned strokes are caused by cases where horizontal strokes have partially overlapped with a ruling line. One solution is to examine each section around the line to determine whether their vertical run-lengths are significantly shorter than the estimated handwriting thickness. If so, we draw extra ink pixels column by column in the direction of the ruling line. U-shape recovery is more difficult. The idea is to examine two sections that are on the same side of the ruling line and also close to each other. If two sections form a U-shape pattern and the imaginary bottom line of the Ushape is around the position of a line, we consider them caused by removal of the ruling line and thus in need of recovery. The particular stroke recovery method is slightly different from either [15] or [16]: we draw a straight line at the middle part of two sections and partial ellipses at the ends to make the artificial strokes more natural. As an example, the result of stroke deformation recovery is shown in Figure 2d. IV. EXPERIMENTAL SETUP Turning to our experiment, we first introduce the database we are working on in Section IV-A, then we discuss our usage of contour-hinge features and the SVM classifier in Section IV-B. A. Data Preparation The Arabic database we are working on is from the DARPA MADCAT project as provided by the Linguistic Data Consortium (LDC). In the current release, there are 7,447 Arabic handwritten document files scanned at 600 dpi and then binarized. The 70 writers are native Arabic speakers. We first partition the database based on the presence of ruling lines on a given page. Next, we extract line images according to the ground-truth file associated with each document. To ensure sufficient data for training and testing, we filter out 10 writers who have a very small number of handwritten pages that contain ruling lines. Next, we cluster each writer s text line images by their document page IDs. Because of the uneven distribution of numbers of handwritten documents among the remaining 60 writers, we would only utilize a small portion of our data if we decide to select document pages first and then divide these pages into lines. Instead, we divide document pages into text lines and then select from each writer 100 text lines which include ruling lines and 500 text lines which do not. During the text line selection, we assure that there is at most one single page that straddle both training and testing datasets. In the end, we combine 40 text lines per writer from the ruling-line-only dataset and 40 text lines per writer from the ruling-line-free dataset to
4 Table I: Datasets used in our experiments. Sample Size (text lines) Dataset Training Testing Total Ruling-line-only 2, ,600 Ruling-line-free 20,700 6,900 27,600 Mixed 3,600 1,200 4,800 be the mixed dataset for experimentation. All datasets have the same 60 writers and there are no overlapping samples between them. To avoid biased sampling, we split each writer s handwritten lines into four disjoint subsets to conduct four-fold crossvalidation. Thus for each fold, the data has been equally divided into four subsets, and each subset in terms serves as atestingsetandtheremainingthreeasatrainingset.the results are then computed by the average performance of all folds. In this way, we ensure that each sample is trained and tested for exactly once. As a control experiment, we generate alineimagedatasetthatisonlypre-processedbyageneric median filter to exclude the scanning noise. A breakdown of all three datasets is shown in Table I. B. Feature Extraction and Classification There has been extensive work in the literature studying feature extraction for writer identification [18], [1]. Here we implement one particular set of features from Bulacu and Schomaker s work [1]. In this set of so-called contourhinge features, for each two adjacent segments (5-pixel long) along the contours, their angles against the horizontal axis are computed and serve as two random variables. By quantizing the angle plane ([0, 2π)) into24bins,we accumulate the count in each bin as we traverse all contours. As those authors did in their work, we only consider cases where the second angle is no smaller than the first. In the end, we normalize the distribution table to compute the joint probability distribution function as a feature vector (300- dimensional). For classification, we use Support Vector Machines for writer identification. A SVM constructs a hyperplane with maximum margin in higher dimensional vector space, where anon-linearlyseparableclassificationproblemintheoriginal vector space may become linearly separable after projecting these feature vectors into higher dimensional space by different mapping functions. The mapping functions are called kernels in the literature. The choice of kernels is critical for determining how to perform the projection into higher dimensional spaces. Commonly used kernels are the linear, polynomial, radial basis functions (RBF), Gaussian Radial basis, etc. K(x, y) =(x y) (2) K(x, y) =(x y +1) d (3) K(x, y) =exp( γ x y 2 ),γ >0 (4) x y 2 K(x, y) =exp( 2σ 2 ) (5) Note that the generic form of SVM is only applicable to 2-class classification. The common way of using it for multiclass classification is to run k(k 1)/2 2-class classifiers where k is the number of classes, and then vote for a multiclass decision using the outputs of all the 2-class classifiers. In our experiments, we employ the libsvm tool [19]. We use the RBF kernel because it offers better discriminability than the linear kernel, while using less parameters than the polynomial kernel. From our experimental results, we found that setting the cost c =10000performed best. To facilitate SVM training and testing, we normalize feature vectors into the unit hyper-cube. In addition, there is also a probability output option available for us to compute the Top-N list. V. EXPERIMENTAL RESULTS Since we want to investigate the impact of ruling lines on writer identification, we treat the feature extraction and classification steps as black boxes. We run the classifier based on different conditions: ruling-line-only, ruling-linefree, and mixed datasets. The results are summarized in a 3 3 matrix as shown in Table II. In this table, bold figures means significant improvements when ruling lines are removed. For convenience in the following discussion, we use the notation E(train/test) to represent an experiment that trains on some dataset and tests on another (or the same) dataset. For example, E(RLO/RLF ) means the experiment that trains on the ruling-line-only dataset and tests on the ruling-line-free dataset. It is clear that removing ruling lines in E(RLO/RLF ), E(RLF/RLO), and E(RLO/M) gives us significant improvements over the control groups. Plotting the latter two in Figure 3, we find that the performance increases quickly within the Top-10 choices. In addition, both experiments have an identification rate greater than 90% for the Top-10 choices. The fact that E(RLF/RLO) outperforms E(RLO/RLF ) might be due to the fact that samples in ruling-linefree dataset exceeds significantly those in ruling-line-only dataset. It is generally accepted that a more extensively trained classifier will outperform one with less training. However, more experiments are needed to validate this hypothesis in this case. There is no surprise that removing ruling lines in the mixed dataset does not make a difference in the E(M/RLO). This is because ruling lines in the control group and cleaned line images in our experiment are both modeled in the training set. However, it is interesting to observe that this experiment gives the best performance
5 Table II: Writer identification accuracy under different training and testing conditions. The first figure in each cell represents the control group (no attempt to remove ruling lines) and the second figure represents the experimental group (ruling lines, if any, removed). Testing Dataset Training Dataset Ruling-line-only (RLO) Ruling-line-free (RLF) Mixed (M) Ruling-line-only (RLO) (62.5%, 58.0%) (31.6%, 50.2%) (65.7%, 74.3%) Ruling-line-free (RLF) (49.0%, 54.9%) (74.7%, 74.7%) (62.7%, 65.4%) Mixed (M) (87.5%, 86.3%) (62.2%, 64.6%) (62.0%, 61.0%) 1 Performance Gains using Different Datasets will have the same properties. Identification Rate Exp(RLO/RLF) Exp(RLO/RLF Control) Exp(RLO/M) Exp(RLO/M Control) Top-N Choice Figure 3: Performance gains for ruling line removal. Dashed curves are performance of the control groups. across the table. Moreover, the classification performance in E(RLF/RLF ) is not surprising since trying to remove ruling lines in a dataset that is free of ruling lines does not affect any data. Therefore the writer identification rates for E(RLF/RLF ) and the control group are identical. As previously stated, we employed four-fold cross validation, and the figures shown in the table are the mean values of all of the folds. Since in data preparation we ensure that each fold is randomly generated, the classification results across folds are quite close to one another. As a quantitative measure, the standard deviations for E(RLO/RLO) are only 0.01 (control group) and 0.01 (experiment group), respectively. It is also interesting to see that there is a slight performance loss in E(RLO/RLO) where ruling line images always appear in both training and testing. As we suggested earlier, removing ruling lines will improve performance when lines are expected to have different properties, e.g., they are not always present, have different thicknesses, spacings, etc. Thus in an artificial experiment such as this, aclassifiermighteffectivelytreattherulinglinesasanother feature to be used in identifying the writer in question. Of course, this is unlikely to be useful in practice since it is unrealistic to assume two pages drawn from different sources VI. CONCLUSIONS AND FUTURE WORK In this paper, we have investigated the impact of ruling lines on writer identification, demonstrating our work using acollectionofarabichandwrittendocuments.thealgorithm we used detects ruling lines by computing a horizontal projection profiles while correcting the skew simultaneously. We then applied a series of post-processing steps that try to correct for deformations caused by the line removal. By testing on ruling-line-only, ruling-line-free, and mixed datasets, we found that removing ruling lines is useful for improving writer identification performance. To date, we have investigated one algorithm for ruling line detection and removal. It would be interesting to know how other ruling line removal algorithms perform in the context of writer identification. As ongoing work, we are examining amodel-basedlineremovalmethod[20]thatoperatesatthe page level. This approach takes advantage of the properties of pre-printed ruling lines and does not require extensive training. It would also be interesting to investigate the impact of ruling lines with different properties (layouts, thicknesses, spacings, etc.) on writer identification. We are currently collecting various blank writing pads with pre-printed ruling lines and using them to synthesize page images. When this ground-truth is ready, we will be able to generate a large collection of data for training and testing purposes. This will allow us to compute better statistics regarding the performance of ruling line removal algorithms. VII. ACKNOWLEDGMENT This work is supported by a DARPA IPTO grant administered by Raytheon BBN Technologies. REFERENCES [1] M. Bulacu and L. Schomaker, Text-independent writer identification and verification using textural and allographic features, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.29,pp ,2007. [2] A. Schlapbach and H. Bunke, A writer identification and verification system using hmm based recognizers, Pattern Analysis and Application, vol.10,pp.33 43,2007.
6 [3] R. Plamondon and G. Lorette, Automatic signature verification and writer identification the state of the art, Pattern Recognition, vol.22,pp ,1989. [4] X. Li and X. Ding, Writer identification of chinese handwriting using grid microstructure feature, in ICB, 2009, pp [5] A. Ayman and Z. R. Abu, Arabic writer identification based on hybrid spectral-statistical measures, Journal of Experimental and Theoretical Artificial Intelligence, vol.19,no.4, pp , [6] B. Helli and M. Moghadam, Persian writer identification using extended Gabor filter. Heidelberg: Springer, [7] U. Garain and T. Paquet, Off-line multi-script writer identification using ar coefficients, in Proc. of the 10th international Conference on Document Analysis and Recognition,2009,pp [17] G. Kim and V. Govindaraju, A lexicon driven approach to handwritten word recognition for real-time applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19,no.4,pp ,1997. [18] S. Srihari, S. Cha, H. Arora, and S. Lee, Individuality of handwriting, Journal of Forensic Science, vol. 47, pp. 1 17, [19] C.-C. Chang and C.-J. Lin, in LIBSVM: a library for support vector machines, 2001, software available at cjlin/libsvm. [20] D. Lopresti and E. Kavallieratou, Ruling line removal in handwritten page images, in Proc. of the 20th International Conference on Pattern Recognition, 2010,acceptedforpublication. [8] B. Li, Z. Sun, and T. Tan, Hierarchical shape primitive features for online text-independent writer identification, in Proc. 10th International Conference on Document Analysis and Recognition, Barcelona,Spain,August2009,pp [9] R. Sabourin and J. Drouhard, Off-line signature verification using directional pdf and neural networks, in Proc. the International Conference on Pattern Recognition, Vancouver, BC, Canada, 1992, pp [10] E. Justino, F. Bortolozzi, and R. Sabourin, A comparison of svm and hmm classifiers in the off-line signature verification, Pattern Recognition Letters,vol.26,pp ,2004. [11] A. Schlapbach and H. Bunke, Writer identification using an hmm-based handwriting recognition system: to normalize the input or not? in Proc. of the 12th international Graphonomics Society, 2005,pp [12] The linguistic data consortium, [13] H. Cao and V. Govindaraju, Handwritten carbon form preprocessing based on markov random field, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, [14] W. Abd-Almageed, J. Kumar, and D. Doermann, Page ruleline removal using linear subspaces in monochromatic handwritten arabic documents, in Proc. of the 12th International Conference on Document Analysis and Recognition,2009,pp [15] K. Arvind, J. Juman, and A. Ramakrishnan, Line removal and restoration of handwritten strokes, in Proc. of the 7th international Conference on Computational Intelligence and Multimedia Application, 2007,pp [16] H. Cao, R. Prasad, and P. Natarajan, A stroke regeneration method for cleaning rule-lines in handwritten document images, in Proc. of the MOCR workshop at the 10th international Conference on Document Analysis and Recognition, 2007.
Alternatives for Page Skew Compensation in Writer Identification
Alternatives for Page Skew Compensation in Writer Identification Jin Chen and Daniel Lopresti Department of Computer Science & Engineering Lehigh University Bethlehem, PA 18015, USA Email: {jic207, lopresti}@cse.lehigh.edu
More informationFine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes
2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationIndian Multi-Script Full Pin-code String Recognition for Postal Automation
2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer
More informationHANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS
International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 TJPRC Pvt. Ltd., HANDWRITTEN GURMUKHI
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationA Model-based Line Detection Algorithm in Documents
A Model-based Line Detection Algorithm in Documents Yefeng Zheng, Huiping Li, David Doermann Laboratory for Language and Media Processing Institute for Advanced Computer Studies University of Maryland,
More informationRecognition of online captured, handwritten Tamil words on Android
Recognition of online captured, handwritten Tamil words on Android A G Ramakrishnan and Bhargava Urala K Medical Intelligence and Language Engineering (MILE) Laboratory, Dept. of Electrical Engineering,
More informationMobile Human Detection Systems based on Sliding Windows Approach-A Review
Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg
More informationAn ICA based Approach for Complex Color Scene Text Binarization
An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in
More informationRecognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM
ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.
More informationOff-Line Multi-Script Writer Identification using AR Coefficients
2009 10th International Conference on Document Analysis and Recognition Off-Line Multi-Script Writer Identification using AR Coefficients Utpal Garain Indian Statistical Institute 203, B.. Road, Kolkata
More informationPart-Based Skew Estimation for Mathematical Expressions
Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing
More informationWord Slant Estimation using Non-Horizontal Character Parts and Core-Region Information
2012 10th IAPR International Workshop on Document Analysis Systems Word Slant using Non-Horizontal Character Parts and Core-Region Information A. Papandreou and B. Gatos Computational Intelligence Laboratory,
More informationSpotting Words in Latin, Devanagari and Arabic Scripts
Spotting Words in Latin, Devanagari and Arabic Scripts Sargur N. Srihari, Harish Srinivasan, Chen Huang and Shravya Shetty {srihari,hs32,chuang5,sshetty}@cedar.buffalo.edu Center of Excellence for Document
More informationWriter Recognizer for Offline Text Based on SIFT
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1057
More informationText Dependent Writer Identification using Support Vector Machine
ext Dependent Writer Identification using Support Vector Machine Saranya K M.Phil Research Scholar, PSGR Krishnammal College for Women, Coimbatore- 641004. Vijaya M S Associate Professor, G.R. Govindarajalu
More informationDetecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds
9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationA Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition
A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition Dinesh Mandalapu, Sridhar Murali Krishna HP Laboratories India HPL-2007-109 July
More informationGabor Features for Offline Arabic Handwriting Recognition
Gabor Features for Offline Arabic Handwriting Recognition Jin Chen Lehigh University Bethlehem, PA 18015 jic207@cse.lehigh.edu Anurag Bhardwaj University of Buffalo Amherst, NY 14260 ab94@buffalo.edu Huaigu
More informationCharacter Recognition
Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches
More informationLearning to Recognize Faces in Realistic Conditions
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAn Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique
An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique I Dinesh KumarVerma, II Anjali Khatri I Assistant Professor (ECE) PDM College of Engineering, Bahadurgarh,
More informationUsing the Kolmogorov-Smirnov Test for Image Segmentation
Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an
More informationPrototype Selection for Handwritten Connected Digits Classification
2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal
More informationOff-line Signature Verification Using Writer-Independent Approach
Off-line Signature Verification Using Writer-Independent Approach Luiz S. Oliveira, Edson Justino, and Robert Sabourin Abstract In this work we present a strategy for off-line signature verification. It
More informationAvailable online at ScienceDirect. Procedia Computer Science 45 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 45 (2015 ) 205 214 International Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) Automatic
More informationLinear Discriminant Analysis in Ottoman Alphabet Character Recognition
Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /
More informationNOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015
Offline Handwritten Signature Verification using Neural Network Pallavi V. Hatkar Department of Electronics Engineering, TKIET Warana, India Prof.B.T.Salokhe Department of Electronics Engineering, TKIET
More informationOff-line Character Recognition using On-line Character Writing Information
Off-line Character Recognition using On-line Character Writing Information Hiromitsu NISHIMURA and Takehiko TIMIKAWA Dept. of Information and Computer Sciences, Kanagawa Institute of Technology 1030 Shimo-ogino,
More informationA Combined Method for On-Line Signature Verification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 14, No 2 Sofia 2014 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2014-0022 A Combined Method for On-Line
More informationA New Algorithm for Detecting Text Line in Handwritten Documents
A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer
More informationLECTURE 6 TEXT PROCESSING
SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems
More informationCursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,
More informationConvolution Neural Networks for Chinese Handwriting Recognition
Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven
More informationPixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j)
4th International Conf. on Document Analysis and Recognition, pp.142-146, Ulm, Germany, August 18-20, 1997 Skew and Slant Correction for Document Images Using Gradient Direction Changming Sun Λ CSIRO Math.
More informationEE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm
EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant
More informationCHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS
CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one
More informationSignature Based Document Retrieval using GHT of Background Information
2012 International Conference on Frontiers in Handwriting Recognition Signature Based Document Retrieval using GHT of Background Information Partha Pratim Roy Souvik Bhowmick Umapada Pal Jean Yves Ramel
More informationKeywords Connected Components, Text-Line Extraction, Trained Dataset.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Language Independent
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationExtracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang
Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion
More informationHandwritten Gurumukhi Character Recognition by using Recurrent Neural Network
139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College
More informationSeparation of Overlapping Text from Graphics
Separation of Overlapping Text from Graphics Ruini Cao, Chew Lim Tan School of Computing, National University of Singapore 3 Science Drive 2, Singapore 117543 Email: {caorn, tancl}@comp.nus.edu.sg Abstract
More informationGraph Matching Iris Image Blocks with Local Binary Pattern
Graph Matching Iris Image Blocs with Local Binary Pattern Zhenan Sun, Tieniu Tan, and Xianchao Qiu Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of
More informationMachine Learning Final Project
Machine Learning Final Project Team: hahaha R01942054 林家蓉 R01942068 賴威昇 January 15, 2014 1 Introduction In this project, we are asked to solve a classification problem of Chinese characters. The training
More informationOnline Mathematical Symbol Recognition using SVMs with Features from Functional Approximation
Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation Birendra Keshari and Stephen M. Watt Ontario Research Centre for Computer Algebra Department of Computer Science
More informationIsolated Handwritten Words Segmentation Techniques in Gurmukhi Script
Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Galaxy Bansal Dharamveer Sharma ABSTRACT Segmentation of handwritten words is a challenging task primarily because of structural features
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationUnconstrained Language Identification Using A Shape Codebook
Unconstrained Language Identification Using A Shape Codebook Guangyu Zhu, Xiaodong Yu, Yi Li, and David Doermann Language and Media Processing Laboratory University of Maryland {zhugy,xdyu,liyi,doermann}@umiacs.umd.edu
More informationHidden Loop Recovery for Handwriting Recognition
Hidden Loop Recovery for Handwriting Recognition David Doermann Institute of Advanced Computer Studies, University of Maryland, College Park, USA E-mail: doermann@cfar.umd.edu Nathan Intrator School of
More informationThe Interpersonal and Intrapersonal Variability Influences on Off- Line Signature Verification Using HMM
The Interpersonal and Intrapersonal Variability Influences on Off- Line Signature Verification Using HMM EDSON J. R. JUSTINO 1 FLÁVIO BORTOLOZZI 1 ROBERT SABOURIN 2 1 PUCPR - Pontifícia Universidade Católica
More informationRobust line segmentation for handwritten documents
Robust line segmentation for handwritten documents Kamal Kuzhinjedathu, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State
More informationA Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script
A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,
More informationStaff Line Detection by Skewed Projection
Staff Line Detection by Skewed Projection Diego Nehab May 11, 2003 Abstract Most optical music recognition systems start image analysis by the detection of staff lines. This work explores simple techniques
More informationAdaptive Learning of an Accurate Skin-Color Model
Adaptive Learning of an Accurate Skin-Color Model Q. Zhu K.T. Cheng C. T. Wu Y. L. Wu Electrical & Computer Engineering University of California, Santa Barbara Presented by: H.T Wang Outline Generic Skin
More informationMULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION
MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION Panca Mudjirahardjo, Rahmadwati, Nanang Sulistiyanto and R. Arief Setyawan Department of Electrical Engineering, Faculty of
More informationObject Tracking using HOG and SVM
Object Tracking using HOG and SVM Siji Joseph #1, Arun Pradeep #2 Electronics and Communication Engineering Axis College of Engineering and Technology, Ambanoly, Thrissur, India Abstract Object detection
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationBagging and Boosting Algorithms for Support Vector Machine Classifiers
Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima
More informationFace Recognition using Eigenfaces SMAI Course Project
Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract
More informationPattern Recognition ( , RIT) Exercise 1 Solution
Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5
More informationPractical Image and Video Processing Using MATLAB
Practical Image and Video Processing Using MATLAB Chapter 18 Feature extraction and representation What will we learn? What is feature extraction and why is it a critical step in most computer vision and
More informationOffline Signature verification and recognition using ART 1
Offline Signature verification and recognition using ART 1 R. Sukanya K.Malathy M.E Infant Jesus College of Engineering And Technology Abstract: The main objective of this project is signature verification
More informationScanner Parameter Estimation Using Bilevel Scans of Star Charts
ICDAR, Seattle WA September Scanner Parameter Estimation Using Bilevel Scans of Star Charts Elisa H. Barney Smith Electrical and Computer Engineering Department Boise State University, Boise, Idaho 8375
More informationRecognition of Unconstrained Malayalam Handwritten Numeral
Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract
More informationInvarianceness for Character Recognition Using Geo-Discretization Features
Computer and Information Science; Vol. 9, No. 2; 2016 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Science and Education Invarianceness for Character Recognition Using Geo-Discretization
More informationImproving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationOnline Signature Verification Technique
Volume 3, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Online Signature Verification Technique Ankit Soni M Tech Student,
More informationHandwritten Devanagari Character Recognition Model Using Neural Network
Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com
More informationA System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation
A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in
More informationUsing Support Vector Machines to Eliminate False Minutiae Matches during Fingerprint Verification
Using Support Vector Machines to Eliminate False Minutiae Matches during Fingerprint Verification Abstract Praveer Mansukhani, Sergey Tulyakov, Venu Govindaraju Center for Unified Biometrics and Sensors
More informationHMM-Based Handwritten Amharic Word Recognition with Feature Concatenation
009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,
More informationChain Code Histogram based approach
An attempt at visualizing the Fourth Dimension Take a point, stretch it into a line, curl it into a circle, twist it into a sphere, and punch through the sphere Albert Einstein Chain Code Histogram based
More informationA Simplistic Way of Feature Extraction Directed towards a Better Recognition Accuracy
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 3, Issue 7 (September 2012), PP. 43-49 A Simplistic Way of Feature Extraction Directed
More informationTexture Segmentation by Windowed Projection
Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw
More informationTracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique
Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique P. Nagabhushan and Alireza Alaei 1,2 Department of Studies in Computer Science,
More informationHandwritten Word Recognition using Conditional Random Fields
Handwritten Word Recognition using Conditional Random Fields Shravya Shetty Harish Srinivasan Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science
More informationCS 231A Computer Vision (Winter 2014) Problem Set 3
CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition
More informationRecognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier
Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier N. Sharma, U. Pal*, F. Kimura**, and S. Pal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationA Framework for Efficient Fingerprint Identification using a Minutiae Tree
A Framework for Efficient Fingerprint Identification using a Minutiae Tree Praveer Mansukhani February 22, 2008 Problem Statement Developing a real-time scalable minutiae-based indexing system using a
More informationA Non-Rigid Feature Extraction Method for Shape Recognition
A Non-Rigid Feature Extraction Method for Shape Recognition Jon Almazán, Alicia Fornés, Ernest Valveny Computer Vision Center Dept. Ciències de la Computació Universitat Autònoma de Barcelona Bellaterra,
More informationAutomated Digital Conversion of Hand-Drawn Plots
Automated Digital Conversion of Hand-Drawn Plots Ruo Yu Gu Department of Electrical Engineering Stanford University Palo Alto, U.S.A. ruoyugu@stanford.edu Abstract An algorithm has been developed using
More informationUser Signature Identification and Image Pixel Pattern Verification
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 13, Number 7 (2017), pp. 3193-3202 Research India Publications http://www.ripublication.com User Signature Identification and Image
More informationENSEMBLE RANDOM-SUBSET SVM
ENSEMBLE RANDOM-SUBSET SVM Anonymous for Review Keywords: Abstract: Ensemble Learning, Bagging, Boosting, Generalization Performance, Support Vector Machine In this paper, the Ensemble Random-Subset SVM
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationIndividuality of Handwritten Characters
Accepted by the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, August 3-6, 2003. (Paper ID: 527) Individuality of Handwritten s Bin Zhang Sargur N. Srihari Sangjik
More informationLearning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting
2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin
More informationSlant Correction using Histograms
Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that
More informationExploring Similarity Measures for Biometric Databases
Exploring Similarity Measures for Biometric Databases Praveer Mansukhani, Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS) University at Buffalo {pdm5, govind}@buffalo.edu Abstract. Currently
More informationIsolated Curved Gurmukhi Character Recognition Using Projection of Gradient
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 6 (2017), pp. 1387-1396 Research India Publications http://www.ripublication.com Isolated Curved Gurmukhi Character
More informationSemi-Automatic Transcription Tool for Ancient Manuscripts
The Venice Atlas A Digital Humanities atlas project by DH101 EPFL Students Semi-Automatic Transcription Tool for Ancient Manuscripts In this article, we investigate various techniques from the fields of
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationLecture 8 Object Descriptors
Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh
More information