HTR Part II: Handwritten Text Recognition

Size: px
Start display at page:

Download "HTR Part II: Handwritten Text Recognition"

Transcription

1 HTR Part II: Handwritten Text Recognition Preprocessing and Feature Extraction for Off-Line Continuous HTR Alejandro H. Toselli & PRHLT-Group Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia May 2nd A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 1 / 50 Outline Introduction: HTR Generalities Motivation Use of conventional OCR systems for Cursive HTR HTR Related Definitions Criteria to Classify HTR Systems HTR Approach based on ASR Technology HTR System Overview Preprocessing at Page Image Level Noise Reduction and Background Removal Skew Correction of Page Image Text Line Images Detection and Extraction Preprocessing at Text Line Image Level Correction/Normalization of Style Attributes Slope and Slant Corrections Size Normalization Feature Extraction Examples of Features Extraction used in HTR Features Extraction used by FKI Group Features Extraction used by RWTH-i6 Group Dimensionality Reduction A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 2 / 50

2 Introduction: HTR Generalities Motivation Motivation 1 Still large part of the currently handled information can be found in handwritten documents:... personal letters handwritten notes survey forms historical manuscripts 2 HTR has become an important issue for many industrial applications: Recognition of legal amounts handwritten in bank checks Signature verification Reading of postal addresses... A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 3 / 50 Introduction: HTR Generalities Use of conventional OCR systems for Cursive HTR Use of conventional OCRs for Cursive HTR OCR: characters are converted from machine-print/handwriting images to ASCII text. Types: For recognizing machine-printed words/characters. For recognizing handwritten words/characters. Word recognition is based on the previous detection and segmentation of their corresponding characters (successfully applied in machine-printed words). Characters/strokes segmentation can be a quite difficult task in cursive handwritten text. Fail to recognize cursive handwritten text: variety of handwriting styles, diversity of stroke shapes and positions, characters touching (or overlapping) each other, etc. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 4 / 50

3 Introduction: HTR Generalities Use of conventional OCR systems for Cursive HTR Use of conventional OCRs for Cursive HTR: Examples Easy Difficult A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 5 / 50 Introduction: HTR Generalities HTR Related Definitions HTR Related Definitions Handwritten Text Recognition: is the task of transforming language represented in its spatial form of handwritten graphical marks into its symbolic representation (ASCII). Text Interpretation: is the task to determine the meaning of a recognized text. Additionally: Writer Identification: is the task of determining the author of a text handwriting from a set of writers. Signature Verification: examination of a signature to determine whether the handwriting is genuine; i.e. whether this belongs to the person who signed it. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 6 / 50

4 Introduction: HTR Generalities Criteria to Classify HTR Systems Criteria to Classify HTR Systems 01 ON LINE Point Sequence Representation (digital pen, tablet, etc.) OFF LINE Bitmap (Image) Representation (Camara, scanner, video, etc.) Other criteria: With/Without Explicit Segmentation Writer Dependent/Independent Open/Closed Vocabulary A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 7 / 50 Introduction: HTR Generalities HTR Approach based on ASR Technology HTR Approach based on ASR Technology Off-Line Handwritten Text Recognition: Very difficult task, which still remains as an open research topic. Share many common points with Automatic Speech Recognition (ASR). ASR-Based Technology Off-Line Handwritten Text Recognition Technology is based on the current and already consolidated ASR Technology. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 8 / 50

5 Introduction: HTR Generalities HTR Approach based on ASR Technology Main Features of the ASR Technology The Current ASR Technology Based on continuous Hidden Markov Models (HMMs). No explicit segmentation of the input speech signal is required. Modeling of Different Perception Levels: Morphological, Lexical and Syntactical. Recognition/Decoding is carried out by searching within a space structured by the above-mentioned perception levels. Use of Stochastic Finite State Automatons (SFSA) to build such strutured search space: HMMs, Regular Language Models. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 9 / 50 HTR System Overview A generic Pattern Recognition System Usually, PR systems share a general structure of four building blocks: 1 Data Acquisition and Preprocessing 2 Feature Extraction Process and Selection (data representation) 3 Training Process 4 Recognition/Classification A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 10 / 50

6 HTR System Overview HTR System Overview Handwritten Text Images Transcriptions PREPROCESSING Normalized Text Lines FEATURE EXTRACTION LANGUAGE MODELING Features Vectors Sequence Lexical and Language Models TRAINING Test Set HMM Models RECOGNITION Words Sequence A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 11 / 50 Preprocessing at Page Image Level Preprocessing at Page Image Level Preprocessing steps commonly applied to handwritten text pages images: Noise Reduction: filtering out noise, which can be due to an intrinsic feature of the original image and/or the digitization process. Background Removal: aiming at removing everything that is not handwritten text. Binarization: transforming (mapping) scaled grey level images into a black-white images. Skew Correction: detection and correction of the page angle with respect to the horizontal direction. Text Lines Extraction: detection and extraction of text lines from page image. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 12 / 50

7 Preprocessing at Page Image Level Noise Reduction and Background Removal Noise Reduction and Background Removal Issues; Stains, smears, ink bleed through. Uneven illumination. contrast variation. Noisy backgroung. etc. Solution: Methods based mainly on (combination of) different thresholding approaches, connected components and image preprocessing filters. Popular thresholding methods: N. Otsu, A threshold selection method from grey level histogram, IEEE Trans. Syst. Man Cybern., vol. 9 no. 1, 1979, pp Global thresholding method. W. Niblack, An Introduction to Digital Image Processing, pp , Prentice Hall, Local thresholding method. J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern Recognition, Volume 33, Issue 2, February 2000, Pages , ISSN Local thresholding method.... A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 13 / 50 Preprocessing at Page Image Level Noise Reduction and Background Removal Noise Reduction and Background Removal:Example A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 14 / 50

8 Preprocessing at Page Image Level Skew Correction of Page Image Skew Correction of Page Image Issues: Introduced during the image scanning process. Defined as the angle betwwen the X-axis and the scanned text lines of the page. Solution: rotation transform when the skew angle is detected. Typical methods based on: Multiresolution, Eigenvalues, Connected components Hough transform Horizontal projection profiles Statistical mixture model (EM algorithm) etc. B. Gatos, N. Papamarkos, and C. Chamzas. Skew detection and text line position determination in digitized documents. Pattern Recognition, 30(9): , A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 15 / 50 Preprocessing at Page Image Level Skew Correction of Page Image Skew Correction of Page Image: Example A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 16 / 50

9 Preprocessing at Page Image Level Text Line Images Detection and Extraction Text Line Images Detection and Extraction There are two different issues with respect to the text lines in a given page image: Detection and Extraction. Typical methods for text line detection are based on: Multiresolution, Eigenvalues Hough transform Horizontal projection profiles ( + RLSA ) Statistical approaches: HMMs etc. Typical methods for text line detection are based on: Connected components Dynamic programming, Distance Maps (cost functions) etc. Laurence Likforman-Sulem, Abderrazak Zahour, and Bruno Taconet Text Line Segmentation of Historical Documents: A Survey. Int. J. Doc. Anal. Recognit. 9, 2 (April 2007), A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 17 / 50 Preprocessing at Page Image Level Text Line Images Detection and Extraction Text Line Images Detection and Extraction: Example A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 18 / 50

10 Preprocessing at Text Line Image Level Preprocessing at Text Line Image Level Style attributes of handwritten text to be corrected/normalized: Slope Angle: angle of the handwritten text line with respect to the horizontal direction. Slant Angle: angle of the handwritten text strokes with respect to the vertical direction. Height of the handwritten text strokes: which can vary according to the task and writer. Of particular interest is the relationship among the sizes of ascender letters (e.g.: b, l, t), descender letter (e.g.: p, q, j) and normal letters (e.g.: a, c, u). Characters Width: as the text height, characters width can vary according to the task and writer. Stroke Thickness: the use of different writing instruments can lead to a variable stroke type and thickness. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 19 / 50 Preprocessing at Text Line Image Level Correction/Normalization of Style Attributes Correction/Normalization of Style Attributes: Overview a Original Image b.1 b.2 b.3 Slope Correction b.4 b.5 c Slant Correction d Size Normalization A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 20 / 50

11 Preprocessing at Text Line Image Level Slope and Slant Corrections Slope and Slant Correction Rotation/shift transform when the slope/slant angle is detected. Methods based on: Dynamic Programming: Bertolami, R.; Uchida, S.; Zimmermann, M.; Bunke, H., Non-Uniform Slant Correction for Handwritten Text Line Recognition, Document Analysis and Recognition, ICDAR Ninth International Conference on, vol.1, no., pp.18,22, Sept Gabor filters: Das Gupta, J.; Chanda, B., Novel methods for slope and slant correction of off-line handwritten text word, Emerging Applications of Information Technology (EAIT), 2012 Third International Conference on, vol., no., pp.295,298, Nov Dec Machine learning techniques (MLP): España-Boquera, S.; Castro-Bleda, M.J.; Gorbe-Moya, J.; Zamora-Martinez, F., Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.33, no.4, pp.767,779, April Edges detectors (Sobel filters) RLSA, projection profiles, linear regression (more or less heuristics) A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 21 / 50 Preprocessing at Text Line Image Level Slope and Slant Corrections Slope Correction: RLSA and Linear Regression Approach based on: Run-length Smearing Algorithm and word slope angles computed by linear regression technique. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 22 / 50

12 Preprocessing at Text Line Image Level Slope and Slant Corrections Slope Correction: Horizontal Projection Profile (HPP) HPP α (i) = cols j=1 I α(i, j) F(HPP α )=std(hpp α )= rows (HPP α HPP α (i)) 2 i=1 rows ˆα = arg max std(hpp α ) α [45:135] A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 23 / 50 Preprocessing at Text Line Image Level Slope and Slant Corrections Slant Correction: Vertical Projection Profile (VPP) (1) Shear transformation is applied on the image I for a wide range of angles α [45 : 135]. I α is a resulting image sheared by the angle α. For each sheared image the corresponding vertical projection profile is obtained: VPP α (j) = rows i=1 I α(i, j) The slant angle ˆα is that which maximizes: ˆα = arg max F(VPP α ) α [45:135] where F can be one of the following objective functions: Standard Deviation: F(VPPα ) = cols Profile Length: F(VPPα ) = cols 1 j=1 (VPP α VPP α (j)) 2 j=1 cols IDIAP: F(VPPα ) = j:c α (j)=1 VPP α(j) 2 : C α (j) = 1 + (VPP α (j) VPP α (j + 1)) 2 VPP α (j) max(i I(i,j)=1) min(i I(i,j)=1) A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 24 / 50

13 Preprocessing at Text Line Image Level Slope and Slant Corrections Slant Correction: Vertical Projection Profile (VPP) (2) Angle = -40 Angle = -20 Angle = 0 Angle = 20 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 25 / 50 Preprocessing at Text Line Image Level Slope and Slant Corrections Slant Correction: Vertical Proj. Profile (VPP) (3) When the standard deviation of vertical projection profile histogram std (VPP α ), presents a multimodal distribution: std (VPPα) In these cases, a slant angle ᾱ is computed as the average value of the angles α whose std (VPP α ) ˆα(1 ρ): α std (VPP α ) ᾱ = α:α ˆα(1 ρ) α:α ˆα(1 ρ) α ρ [0, 1] std (VPP α ) A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 26 / 50

14 Preprocessing at Text Line Image Level Size Normalization Size Normalization Encompasses two different steps: text region detection and normalization. Into a text line image, there different areas are envisioned: Ascender Area Main Body Area Descender Area Text line mormalization approaches based on: Conventional techniques: U.-V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. International Journal of Pattern Recognition and Artificial Intelligence, 15:6590, Machine learning techniques (MLP): España-Boquera, S.; Castro-Bleda, M.J.; Gorbe-Moya, J.; Zamora-Martinez, F., Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.33, no.4, pp.767,779, April A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 27 / 50 Preprocessing at Text Line Image Level Size Normalization Size Normalization: Based on RLSA and Linear Regression 1 Orig. Image 2 RLSA 3 Upper-Border 4 Lower-Border 5 Linear-Rgrs. 6 Resizing A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 28 / 50

15 Feature Extraction Feature Extraction (FE): Generalities Establish a new representation space of input signal (image). FE should: be compact. minimize the intra-class variability. maximize the inter-class variability. be adequate to the kind of recognizer/classifier employed. be easy to obtained at low computational cost. The FE quality is indirectly evaluated through assessing the recognizer/classifier performance. There are several formal methods described in the literature for FE selection, although they are not widely used. One desirable FE property would be that the original signal could be rebuilt again (exactly or approximately) by using this FE. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 29 / 50 Feature Extraction Feature Extraction: Taxonomy (1) Global or Local Features. Extracted from statistical distribution of points: Advant.: simple, low computational cost and dimensionality. Desadv.: can not permit to rebuild the original signal. E.g.: vertical/horizontal projection profile, mean, standard deviation. Extracted from global transformations or series expansions: Advant.: allow to rebuild (totally or approximately) the original signal. Desadv.: generally involve a high computational cost. E.g.: Fourier Transforms, Wavelets Discrete Transforms, 2D Gabor Filters, Geometric Moments. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 30 / 50

16 Feature Extraction Feature Extraction: Taxonomy (2) Topological and geometrical features extraction: Advant.: fairly robust across different handwriting styles and distortions. Disadv.: complex implementations and computation times. E.g.: contour chains, geometric property measures and representations, primitives to build topological structures and graphs to define relations among them. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 31 / 50 Feature Extraction Feature Extraction: HMMs Feature Extraction suitable for being used with HMMs: Local feature extraction. Sequence of feature vectors of fixed dimensionality. Feature vectors extracted following the natural writing direction (from left to right). Based on geometric features. Allow to rebuild approximately the original (preprocessed) image. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 32 / 50

17 Feature Extraction Examples of Features Extraction used in HTR Comparative among PRHLT s, FKI s and RWTH s FEs PRHLT FKI RWHT-i6 (old) Based on Bazzi&Schwartz FKI RWTH Type Geometric Geometric Geometric Prev. Preproc. yes yes no Extracted from Local Window Column of Pixels Column of Pixels Vector dimens PCA(7 height img ) Size-Norm. Imag. no yes (height img ) yes (height img ) #Vectors/Image width img width img width img Issam Bazzi, Richard Schwartz y John Makhoul. An Omnifont Open-Vocabulary OCR System for English and Arabic. IEEE Trans. Pattern Analysis and Machine Intelligence. 21(6), (June 1999). Marti, U. and Bunke, H Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition systems. In Hidden Markov Models: Applications in Computer Vision World Scientific Series In Machine Perception And Artificial Intelligence Series, vol. 45. World Scientific Publishing Co., River Edge, NJ, P. Dreuw, D. Rybach, C. Gollan and H. Ney. Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition. In 10th Int. Conf. on Document Analysis and Recognition (July 2009). A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 33 / 50 Feature Extraction Examples of Features Extraction used in HTR PRHLT FE Method: Sample Window N M Original Image Grid Sample Window A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 34 / 50

18 Feature Extraction Examples of Features Extraction used in HTR PRHLT FE Method: Grey Level Feature n m Stroke segment enclosed by the sampling window Sub-sampling points defined inside the sampling window and spread out over the stroke segment smoothed by the Gaussian filter The image intensity Î(i, j), smoothed by a Gaussian filter, is: [ Î(i, j) = I(i, j) exp 1 ( j (n/2) 2 2 (n/4) 2 + i )] (m/2)2 (m/4) 2 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 35 / 50 Feature Extraction Examples of Features Extraction used in HTR PRHLT FE Method: Derivative Features n m Average number of pixels per column (ANPC): m I(i, j) i=1 g j = m Linear approximation: mse 0 mse(a, b) = n j=1 w j (g j (a j +b)) ANPR Gaussian filter: w j = exp ( ) 1 (j n/2) 2 2 (n/4) 2 ANPC Restriction: mse(a, b) a = 0 and mse(a, b) b = Derivative: a = ( n ) ( w n j g j w j j j=1 j=1 ( n ) w 2 ( n j j j=1 ) ( n j=1 w j j=1 w j ) ( n ) ( n j=1 w j j 2 ) j=1 w j g j j ) A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 36 / 50

19 Feature Extraction Examples of Features Extraction used in HTR PRHLT FE Method: Visual Representation (1) Features Vectors Sequence {}}{ Grey Level Horizontal Derivative Vertical Derivative A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 37 / 50 Feature Extraction Examples of Features Extraction used in HTR PRHLT FE Method: Visual Representation (2) ORIGINAL IMAGE NORMALIZED GREY LEVEL HORIZONTAL DERIVATIVE VERTICAL DERIVATE K = 16 3 K = 20 3 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 38 / 50

20 Feature Extraction Features Extraction used by FKI Group FKI FE Method FE using one-column-width sliding window, moving along the image from left to right at one-column-step. Nine geometrical quantities are computed on each position: c 1 (j): Number of black pixels in the window. c 2 (j): Center of gravity of the window. c 3 (j): Second order moment of the window. c 4 (j): Position of the upper contour in the window. c 5 (j): Position of the lower contour in the window. c 6 (j): Orientation of the upper contour in the window. c 7 (j): Orientation of the lower contour in the window. c 8 (j): Number of black-white transitions in vertical direction. c 9 (j): Number of black pixels between the upper and lower contours. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 39 / 50 Feature Extraction Features Extraction used by FKI Group FKI FE Method: Mathematical Formulation Expressed in mathematical terms: blk: black pixel wht: white pixel m c 1 (j) = I(i, j) i=1 c 2 (j) = 1 m m i I(i, j) i=1 c 3 (j) = 1 m m 2 i 2 I(i, j) i=1 c 4 (j) = min(i I(i, j) = blk) c 5 (j) = max(i I(i, j) = blk) c 6 (j) = c 4(j + 1) c 4 (j 1) 2 c 7 (j) = c 5(j + 1) c 5 (j 1) 2 c 8 (j) = NT blk wht (I(i, j)) c 9 (j) = I(i, j) c 4 (j)<j<c 5 (j) A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 40 / 50

21 Feature Extraction Features Extraction used by FKI Group FKI FE Method: Vissual example j 1 j j+1 i= i=m=12 c 1 (j) = 5 c 2 (j) = = 7, 4 c 3 (j) = 4, , , , , c 4 (j) = 3 c 5 (j) = 11 c 6 (j) = 4 2 = c 7 (j) = = 0, 5 2 c 8 (j) = 3 c 9 (j) = 3 = 7, 44 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 41 / 50 Feature Extraction Features Extraction used by RWTH-i6 Group RWTH-i6 FE Method (OLD) Using one-column-width sliding window, moving along the image from left to right at one-column-step: 1 Column of image-pixels X j at every column-step: j = 1,..., W img. 2 Augmented with their spatial derivatives: = X j X j 1. Thereby, features on each point j: Features(j) = (X j j ). 3 To incorporate spatial context, 7 consecutive columns-features in a sliding window are concatenated: (X j 3 j 3 )(X j 2 j 2 )... (X j+3 j+3 ). 4 Finally, PCA transformation matrix is applied to reduce the high features dimensionality. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 42 / 50

22 Feature Extraction Features Extraction used by RWTH-i6 Group RWTH-i6 FE Method (New) Employ a normaization scheme of handwritten test line image attributes based on image moments. Kozielski, M.; Forster, J.; Ney, H., Moment-Based Image Normalization for Handwritten Text Recognition, Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on, pp.256,261, Sept The algorithm operates directly on grey-scale images Image gradient and zero-th order moments to globally normalize the stroke thickness of a pattern. The image is segmented into slices using a sliding window, where size and shift of the sliding window are estimated using moments. Local variability in size and position is modelled independently in separate slices using second-order moments. The final feature vector is subject to PCA transformation and its number of components is reduced to 30. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 43 / 50 Feature Extraction Dimensionality Reduction Dimensionality Reduction for real-value FE Vectors Objective: Aiming at reducing the recognition time without significant loss of accuracy. Applied on real-value (feature) vectors. Filter out noise. Discard redundant features. Among the most employed dimensionality reduction methods, we have: PCA: Principal Analysis Components LDA: Linear Discriminant Analysis A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 44 / 50

23 Feature Extraction Dimensionality Reduction PCA: Principal Analysis Components Aim at: reducing the data dimensionality (of sequence feature vectors). characterizing the main trends exhibited by data set. Method: based on covariance analysis between components (or factors). Characteristic: it does not require any previous knowledge about the class label of each of vector (unsupervised method). Formalization: Orthogonal linear transformation which projects the data onto a new coordinate system, such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 45 / 50 Feature Extraction Dimensionality Reduction PCA: Example (1) Two-class labeled points onto 2D-real-space: µ A = (25 25) ( ) 32 6 σ A = 6 2 µ B = ( ) ( ) 32 6 σ B = 6 2 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 46 / 50

24 Feature Extraction Dimensionality Reduction PCA: Example (2) Two-class labeled points onto 2D-real-space: µ A = (25 25) ( ) 32 6 σ A = 6 2 µ B = ( ) ( ) 32 6 σ B = 6 2 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 47 / 50 Feature Extraction Dimensionality Reduction LDA: Linear Discriminant Analysis Objective: to project dataset on a space of small dimension such that its class information is maximally preserved. Method: based on covariance analysis between-class and within-class of components (or factors). Characteristic: requires a previous knowledge about the class label of each vector (supervised method). Formalization: as PCA, LDA involves an orthogonal linear transformation which projects the input data onto a new coordinates system, such the projected data have maximal between-class-variance and minimal within-class-variance. A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 48 / 50

25 Feature Extraction Dimensionality Reduction LDA: Example of the PCA case Two-class labeled points onto 2D-real-space: PCA Projection µ A = (25 25) ( ) 32 6 σ A = 6 2 µ B = ( ) ( ) 32 6 σ B = 6 2 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 49 / 50 Feature Extraction Dimensionality Reduction LDA: Example of the LDA case Two-class labeled points onto 2D-real-space: LDA projection µ A = (25 25) ( ) 32 6 σ A = 6 2 µ B = ( ) ( ) 32 6 σ B = 6 2 A. H. Toselli & PRHLT-Group (DSIC - UPV) Preproc. and FE for Off-Line HTR 50 / 50

Handwritten Text Recognition

Handwritten Text Recognition Handwritten Text Recognition M.J. Castro-Bleda, S. España-Boquera, F. Zamora-Martínez Universidad Politécnica de Valencia Spain Avignon, 9 December 2010 Text recognition () Avignon Avignon, 9 December

More information

OnLine Handwriting Recognition

OnLine Handwriting Recognition OnLine Handwriting Recognition (Master Course of HTR) Alejandro H. Toselli Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia February 26, 2008 A.H. Toselli (ITI -

More information

LECTURE 6 TEXT PROCESSING

LECTURE 6 TEXT PROCESSING SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems

More information

A Document Image Analysis System on Parallel Processors

A Document Image Analysis System on Parallel Processors A Document Image Analysis System on Parallel Processors Shamik Sural, CMC Ltd. 28 Camac Street, Calcutta 700 016, India. P.K.Das, Dept. of CSE. Jadavpur University, Calcutta 700 032, India. Abstract This

More information

Robust Phase-Based Features Extracted From Image By A Binarization Technique

Robust Phase-Based Features Extracted From Image By A Binarization Technique IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 4, Ver. IV (Jul.-Aug. 2016), PP 10-14 www.iosrjournals.org Robust Phase-Based Features Extracted From

More information

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE K. Kaviya Selvi 1 and R. S. Sabeenian 2 1 Department of Electronics and Communication Engineering, Communication Systems, Sona College

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques Ajay K. Talele Department of Electronics Dr..B.A.T.U. Lonere. Sanjay L Nalbalwar

More information

Scanning Neural Network for Text Line Recognition

Scanning Neural Network for Text Line Recognition 2012 10th IAPR International Workshop on Document Analysis Systems Scanning Neural Network for Text Line Recognition Sheikh Faisal Rashid, Faisal Shafait and Thomas M. Breuel Department of Computer Science

More information

IDIAP. Martigny - Valais - Suisse IDIAP

IDIAP. Martigny - Valais - Suisse IDIAP R E S E A R C H R E P O R T IDIAP Martigny - Valais - Suisse Off-Line Cursive Script Recognition Based on Continuous Density HMM Alessandro Vinciarelli a IDIAP RR 99-25 Juergen Luettin a IDIAP December

More information

Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents

Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents Vicente Bosch Campos vbosch@iti.upv.es Alejandro Héctor Toselli ahector@iti.upv.es Enrique Vidal evidal@iti.upv.es

More information

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques 1 Lohitha B.J, 2 Y.C Kiran 1 M.Tech. Student Dept. of ISE, Dayananda Sagar College

More information

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 6 (2017), pp. 1387-1396 Research India Publications http://www.ripublication.com Isolated Curved Gurmukhi Character

More information

Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents

Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents Vicente Bosch vbosch@iti.upv.es Alejandro Hector Toselli ahector@iti.upv.es Enrique Vidal evidal@iti.upv.es Pattern

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 TJPRC Pvt. Ltd., HANDWRITTEN GURMUKHI

More information

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System Mono-font Cursive Arabic Text Recognition Using Speech Recognition System M.S. Khorsheed Computer & Electronics Research Institute, King AbdulAziz City for Science and Technology (KACST) PO Box 6086, Riyadh

More information

Handwritten Text Recognition

Handwritten Text Recognition Handwritten Text Recognition M.J. Castro-Bleda, Joan Pasto Universidad Politécnica de Valencia Spain Zaragoza, March 2012 Text recognition () TRABHCI Zaragoza, March 2012 1 / 1 The problem: Handwriting

More information

IJSER. Abstract : Image binarization is the process of separation of image pixel values as background and as a foreground. We

IJSER. Abstract : Image binarization is the process of separation of image pixel values as background and as a foreground. We International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 1238 Adaptive Local Image Contrast in Image Binarization Prof.Sushilkumar N Holambe. PG Coordinator ME(CSE),College

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

INTELLIGENT transportation systems have a significant

INTELLIGENT transportation systems have a significant INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 205, VOL. 6, NO. 4, PP. 35 356 Manuscript received October 4, 205; revised November, 205. DOI: 0.55/eletel-205-0046 Efficient Two-Step Approach for Automatic

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

Non-uniform Slant Correction using Generalized Projections

Non-uniform Slant Correction using Generalized Projections I J C T A, 9(17) 2016, pp. 8489-8497 International Science Press Non-uniform Slant Correction using Generalized Projections A. M. Hafiz * and G. M. Bhat * ABSTRACT Slant Correction is an important component

More information

Advances in Natural and Applied Sciences. Efficient Illumination Correction for Camera Captured Image Documents

Advances in Natural and Applied Sciences. Efficient Illumination Correction for Camera Captured Image Documents AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Efficient Illumination Correction for Camera Captured Image Documents 1

More information

Handwritten Hindi Numerals Recognition System

Handwritten Hindi Numerals Recognition System CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem

More information

ISSN Vol.03,Issue.02, June-2015, Pages:

ISSN Vol.03,Issue.02, June-2015, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.03,Issue.02, June-2015, Pages:0077-0082 Evaluation of Ancient Documents and Images by using Phase Based Binarization K. SRUJANA 1, D. C. VINOD R KUMAR 2 1 PG Scholar,

More information

Image Enhancement Techniques for Fingerprint Identification

Image Enhancement Techniques for Fingerprint Identification March 2013 1 Image Enhancement Techniques for Fingerprint Identification Pankaj Deshmukh, Siraj Pathan, Riyaz Pathan Abstract The aim of this paper is to propose a new method in fingerprint enhancement

More information

Detecting Dense Foreground Stripes in Arabic Handwriting for Accurate Baseline Positioning

Detecting Dense Foreground Stripes in Arabic Handwriting for Accurate Baseline Positioning Detecting Dense Foreground Stripes in Arabic Handwriting for Accurate Baseline Positioning Felix Stahlberg Qatar Computing Research Institute, HBKU Doha, Qatar Email: fstahlberg@qf.org.qa Stephan Vogel

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

Comparing Natural and Synthetic Training Data for Off-line Cursive Handwriting Recognition

Comparing Natural and Synthetic Training Data for Off-line Cursive Handwriting Recognition Comparing Natural and Synthetic Training Data for Off-line Cursive Handwriting Recognition Tamás Varga and Horst Bunke Institut für Informatik und angewandte Mathematik, Universität Bern Neubrückstrasse

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

A two-stage approach for segmentation of handwritten Bangla word images

A two-stage approach for segmentation of handwritten Bangla word images A two-stage approach for segmentation of handwritten Bangla word images Ram Sarkar, Nibaran Das, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri #, Dipak Kumar Basu Computer Science & Engineering Department,

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script

Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script 2012 International Conference on Frontiers in Handwriting Recognition Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script Leonard Rothacker, Szilárd Vajda, Gernot

More information

MORPH-II: Feature Vector Documentation

MORPH-II: Feature Vector Documentation MORPH-II: Feature Vector Documentation Troy P. Kling NSF-REU Site at UNC Wilmington, Summer 2017 1 MORPH-II Subsets Four different subsets of the MORPH-II database were selected for a wide range of purposes,

More information

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information 2012 10th IAPR International Workshop on Document Analysis Systems Word Slant using Non-Horizontal Character Parts and Core-Region Information A. Papandreou and B. Gatos Computational Intelligence Laboratory,

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.

More information

SKEW DETECTION AND CORRECTION

SKEW DETECTION AND CORRECTION CHAPTER 3 SKEW DETECTION AND CORRECTION When the documents are scanned through high speed scanners, some amount of tilt is unavoidable either due to manual feed or auto feed. The tilt angle induced during

More information

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College

More information

A Statistical approach to line segmentation in handwritten documents

A Statistical approach to line segmentation in handwritten documents A Statistical approach to line segmentation in handwritten documents Manivannan Arivazhagan, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University

More information

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation 009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,

More information

A Model-based Line Detection Algorithm in Documents

A Model-based Line Detection Algorithm in Documents A Model-based Line Detection Algorithm in Documents Yefeng Zheng, Huiping Li, David Doermann Laboratory for Language and Media Processing Institute for Advanced Computer Studies University of Maryland,

More information

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 2, ISSUE 1 JAN-2015 Offline Handwritten Signature Verification using Neural Network Pallavi V. Hatkar Department of Electronics Engineering, TKIET Warana, India Prof.B.T.Salokhe Department of Electronics Engineering, TKIET

More information

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network International Journal of Computer Science & Communication Vol. 1, No. 1, January-June 2010, pp. 91-95 Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network Raghuraj

More information

OFFLINE SIGNATURE VERIFICATION

OFFLINE SIGNATURE VERIFICATION International Journal of Electronics and Communication Engineering and Technology (IJECET) Volume 8, Issue 2, March - April 2017, pp. 120 128, Article ID: IJECET_08_02_016 Available online at http://www.iaeme.com/ijecet/issues.asp?jtype=ijecet&vtype=8&itype=2

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Image-Based Face Recognition using Global Features

Image-Based Face Recognition using Global Features Image-Based Face Recognition using Global Features Xiaoyin xu Research Centre for Integrated Microsystems Electrical and Computer Engineering University of Windsor Supervisors: Dr. Ahmadi May 13, 2005

More information

International Journal of Advance Research in Engineering, Science & Technology

International Journal of Advance Research in Engineering, Science & Technology Impact Factor (SJIF): 4.542 International Journal of Advance Research in Engineering, Science & Technology e-issn: 2393-9877, p-issn: 2394-2444 Volume 4, Issue 4, April-2017 A Simple Effective Algorithm

More information

An evaluation of HMM-based Techniques for the Recognition of Screen Rendered Text

An evaluation of HMM-based Techniques for the Recognition of Screen Rendered Text An evaluation of HMM-based Techniques for the Recognition of Screen Rendered Text Sheikh Faisal Rashid 1, Faisal Shafait 2, and Thomas M. Breuel 1 1 Technical University of Kaiserslautern, Kaiserslautern,

More information

5. Feature Extraction from Images

5. Feature Extraction from Images 5. Feature Extraction from Images Aim of this Chapter: Learn the Basic Feature Extraction Methods for Images Main features: Color Texture Edges Wie funktioniert ein Mustererkennungssystem Test Data x i

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Part 9: Representation and Description AASS Learning Systems Lab, Dep. Teknik Room T1209 (Fr, 11-12 o'clock) achim.lilienthal@oru.se Course Book Chapter 11 2011-05-17 Contents

More information

Binarization-free Text Line Extraction for Historical Manuscripts

Binarization-free Text Line Extraction for Historical Manuscripts Binarization-free Text Line Extraction for Historical Manuscripts Nikolaos Arvanitopoulos and Sabine Süsstrunk School of Computer and Communication Sciences, EPFL, Switzerland 1 Introduction Nowadays,

More information

Face Detection and Recognition in an Image Sequence using Eigenedginess

Face Detection and Recognition in an Image Sequence using Eigenedginess Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras

More information

Text Line Detection for Heterogeneous Documents

Text Line Detection for Heterogeneous Documents Text Line Detection for Heterogeneous Documents Markus Diem, Florian Kleber and Robert Sablatnig Computer Vision Lab Vienna University of Technology Email: diem@caa.tuwien.ac.at Abstract Text line detection

More information

Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition

Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition 2012 International Conference on Frontiers in Handwriting Recognition Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition Patrick Doetsch,

More information

Towards Automatic Video-based Whiteboard Reading

Towards Automatic Video-based Whiteboard Reading Towards Automatic Video-based Whiteboard Reading Markus Wienecke Gernot A. Fink Gerhard Sagerer Bielefeld University, Faculty of Technology 33594 Bielefeld, Germany E-mail: mwieneck@techfak.uni-bielefeld.de

More information

Hierarchical Shape Primitive Features for Online Text-independent Writer Identification

Hierarchical Shape Primitive Features for Online Text-independent Writer Identification 2009 10th International Conference on Document Analysis and Recognition Hierarchical Shape Primitive Features for Online Text-independent Writer Identification Bangy Li, Zhenan Sun and Tieniu Tan Center

More information

Slant Correction using Histograms

Slant Correction using Histograms Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that

More information

Word Matching of handwritten scripts

Word Matching of handwritten scripts Word Matching of handwritten scripts Seminar about ancient document analysis Introduction Contour extraction Contour matching Other methods Conclusion Questions Problem Text recognition in handwritten

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh, CEO at Mapizy (www.mapizy.com) and InFarm (www.infarm.io) Lecture 02 Binary Image Analysis Objectives Revision of image formation

More information

Segmentation of Characters of Devanagari Script Documents

Segmentation of Characters of Devanagari Script Documents WWJMRD 2017; 3(11): 253-257 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manpreet Kaur Research

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Lecture 6: Edge Detection

Lecture 6: Edge Detection #1 Lecture 6: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Options for Image Representation Introduced the concept of different representation or transformation Fourier Transform

More information

EUSIPCO

EUSIPCO EUSIPCO 2013 1569743917 BINARIZATION OF HISTORICAL DOCUMENTS USING SELF-LEARNING CLASSIFIER BASED ON K-MEANS AND SVM Amina Djema and Youcef Chibani Speech Communication and Signal Processing Laboratory

More information

Recognition of Unconstrained Malayalam Handwritten Numeral

Recognition of Unconstrained Malayalam Handwritten Numeral Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract

More information

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

A Hierarchical Pre-processing Model for Offline Handwritten Document Images International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 3, March 2015, PP 41-45 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org A Hierarchical

More information

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique I Dinesh KumarVerma, II Anjali Khatri I Assistant Professor (ECE) PDM College of Engineering, Bahadurgarh,

More information

Robust line segmentation for handwritten documents

Robust line segmentation for handwritten documents Robust line segmentation for handwritten documents Kamal Kuzhinjedathu, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State

More information

Invarianceness for Character Recognition Using Geo-Discretization Features

Invarianceness for Character Recognition Using Geo-Discretization Features Computer and Information Science; Vol. 9, No. 2; 2016 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Science and Education Invarianceness for Character Recognition Using Geo-Discretization

More information

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation * A. H. M. Al-Helali, * W. A. Mahmmoud, and * H. A. Ali * Al- Isra Private University Email: adnan_hadi@yahoo.com Abstract:

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY INTELLEGENT APPROACH FOR OFFLINE SIGNATURE VERIFICATION USING CHAINCODE AND ENERGY FEATURE EXTRACTION ON MULTICORE PROCESSOR Raju

More information

Chapter 2. Components

Chapter 2. Components Chapter 2 [2]OCR: General Architecture and Components In some areas which require the automation of human intelligence, such as chess playing, tremendous improvements are achieved over the last few decades.

More information

L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N

L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N leonard rothacker Diploma thesis Department of computer science Technische Universität

More information

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++ Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Handwritten Devanagari Character Recognition Model Using Neural Network

Handwritten Devanagari Character Recognition Model Using Neural Network Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com

More information

A Fast Recognition System for Isolated Printed Characters Using Center of Gravity and Principal Axis

A Fast Recognition System for Isolated Printed Characters Using Center of Gravity and Principal Axis Applied Mathematics, 2013, 4, 1313-1319 http://dx.doi.org/10.4236/am.2013.49177 Published Online September 2013 (http://www.scirp.org/journal/am) A Fast Recognition System for Isolated Printed Characters

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

Document image binarisation using Markov Field Model

Document image binarisation using Markov Field Model 009 10th International Conference on Document Analysis and Recognition Document image binarisation using Markov Field Model Thibault Lelore, Frédéric Bouchara UMR CNRS 6168 LSIS Southern University of

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

TEXTURE ANALYSIS USING GABOR FILTERS

TEXTURE ANALYSIS USING GABOR FILTERS TEXTURE ANALYSIS USING GABOR FILTERS Texture Types Definition of Texture Texture types Synthetic Natural Stochastic < Prev Next > Texture Definition Texture: the regular repetition of an element or pattern

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION

DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION S P Sachin 1, Banumathi K L 2, Vanitha R 3 1 UG, Student of Department of ECE, BIET, Davangere, (India) 2,3 Assistant Professor,

More information

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Segmentation Based Optical Character Recognition for Handwritten Marathi characters Segmentation Based Optical Character Recognition for Handwritten Marathi characters Madhav Vaidya 1, Yashwant Joshi 2,Milind Bhalerao 3 Department of Information Technology 1 Department of Electronics

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,

More information

Dynamic Stroke Information Analysis for Video-Based Handwritten Chinese Character Recognition

Dynamic Stroke Information Analysis for Video-Based Handwritten Chinese Character Recognition Dynamic Stroke Information Analysis for Video-Based Handwritten Chinese Character Recognition Feng Lin and Xiaoou Tang Department of Information Engineering The Chinese University of Hong Kong Shatin,

More information

Text lines and snippets extraction for 19th century handwriting documents layout analysis

Text lines and snippets extraction for 19th century handwriting documents layout analysis Author manuscript, published in "2009 10th International Conference on Document Analysis and Recognition, Barcelona : Spain (2009)" Text lines and snippets extraction for 19th century handwriting documents

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

HCR Using K-Means Clustering Algorithm

HCR Using K-Means Clustering Algorithm HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are

More information

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in

More information

Effect of Pre-Processing on Binarization

Effect of Pre-Processing on Binarization Boise State University ScholarWorks Electrical and Computer Engineering Faculty Publications and Presentations Department of Electrical and Computer Engineering 1-1-010 Effect of Pre-Processing on Binarization

More information