Word Matching of handwritten scripts

Similar documents
Word matching using single closed contours for indexing handwritten historical documents

Extracting Characters From Books Based On The OCR Technology

Input sensitive thresholding for ancient Hebrew manuscript

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

IJSER. Abstract : Image binarization is the process of separation of image pixel values as background and as a foreground. We

Handwritten Script Recognition at Block Level

OPTICAL CHARACTER RECOGNITION FOR VIETNAMESE SCANNED TEXT

Multi-view 3D retrieval using silhouette intersection and multi-scale contour representation

A New Algorithm for Detecting Text Line in Handwritten Documents

OCR and OCV. Tom Brennan Artemis Vision Artemis Vision 781 Vallejo St Denver, CO (303)

Lab 4: Automatical thresholding and simple OCR

Handwritten Hindi Numerals Recognition System

A New Approach to Computation of Curvature Scale Space Image for Shape Similarity Retrieval

[10] Industrial DataMatrix barcodes recognition with a random tilt and rotating the camera

LECTURE 6 TEXT PROCESSING

6. Applications - Text recognition in videos - Semantic video analysis

INTELLIGENT transportation systems have a significant

Engineering Problem and Goal

SCENE TEXT BINARIZATION AND RECOGNITION

EE 584 MACHINE VISION

Filtering, scale, orientation, localization, and texture. Nuno Vasconcelos ECE Department, UCSD (with thanks to David Forsyth)

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram

MURDOCH RESEARCH REPOSITORY.

Lecture Image Enhancement and Spatial Filtering

Scanner Parameter Estimation Using Bilevel Scans of Star Charts

Contour-Based Character Extraction from Text Regions of an Image

Filtering Images. Contents

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Short Survey on Static Hand Gesture Recognition

Image Processing

SKEW DETECTION AND CORRECTION

Hidden Loop Recovery for Handwriting Recognition

Computer Analysis of the Dead Sea Scroll Manuscripts

ISSN Vol.03,Issue.02, June-2015, Pages:

AFFINE INVARIANT CURVE MATCHING USING NORMALIZATION AND CURVATURE SCALE-SPACE. V. Giannekou, P. Tzouveli, Y. Avrithis and S.

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

Image gradients and edges April 11 th, 2017

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

Case Study in Hebrew Character Searching

Image gradients and edges April 10 th, 2018

A two-stage approach for segmentation of handwritten Bangla word images

Effect of Pre-Processing on Binarization

Edge and local feature detection - 2. Importance of edge detection in computer vision

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

[ ] Review. Edges and Binary Images. Edge detection. Derivative of Gaussian filter. Image gradient. Tuesday, Sept 16

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

An Efficient Character Segmentation Based on VNP Algorithm

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

Character Recognition Internals

EUSIPCO

A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection

Local Image preprocessing (cont d)

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

Report: Reducing the error rate of a Cat classifier

Character Segmentation for Telugu Image Document using Multiple Histogram Projections

Segmentation free Bangla OCR using HMM: Training and Recognition

Character Recognition

Content-based Information Retrieval from Handwritten Documents

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Using Game Theory for Image Segmentation

CS4670: Computer Vision

OBJECT detection in general has many applications

A New Technique of Extraction of Edge Detection Using Digital Image Processing

International Journal of Advance Research in Engineering, Science & Technology

Multi-Step Segmentation Method Based on Adaptive Thresholds for Chinese Calligraphy Characters

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Unsupervised, Fast and Precise Recognition of Digital Arcs in Noisy Images

Machine Print Filter for Handwriting Analysis

Research on QR Code Image Pre-processing Algorithm under Complex Background

A Document Layout Analysis Method Based on Morphological Operators and Connected Components

Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction

Image gradients and edges

Counting Particles or Cells Using IMAQ Vision

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

Text Recognition in Videos using a Recurrent Connectionist Approach

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 7, JULY 2014

A Segmentation Free Approach to Arabic and Urdu OCR

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Prototype Selection for Handwritten Connected Digits Classification

An Efficient Parametrization of Character Degradation Model for Semi-synthetic Image Generation

Advances in Natural and Applied Sciences. Efficient Illumination Correction for Camera Captured Image Documents

The SIFT (Scale Invariant Feature

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Handwritten Devanagari Character Recognition

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

Silhouette Coherence for Camera Calibration under Circular Motion

Image Segmentation! Thresholding Watershed. Hodzic Ernad Seminar Computa9onal Intelligence

Convexization in Markov Chain Monte Carlo

Slant Correction using Histograms

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features

Let s start with occluding contours (or interior and exterior silhouettes), and look at image-space algorithms. A very simple technique is to render

Robust Phase-Based Features Extracted From Image By A Binarization Technique

Historical Handwritten Document Image Segmentation Using Background Light Intensity Normalization

HOUGH TRANSFORM. Plan for today. Introduction to HT. An image with linear structures. INF 4300 Digital Image Analysis

Distributed Algorithms. Image and Video Processing

Transcription:

Word Matching of handwritten scripts Seminar about ancient document analysis

Introduction Contour extraction Contour matching Other methods Conclusion Questions

Problem Text recognition in handwritten historical documents: Traditional handwriting recognizers based on OCR do not perform well Problem of noise faded ink weak strokes

Idea Instead of single character matching, the whole word is matched word patterns are stored in a data base training set needed (or interactive teaching) recognized words are indexed by a matching process The matching is done on contour features amount of concavity and convexity at different scale levels.

Contour extraction Preprocessing step of the matching process binarization of the image Position estimation Labeling of components Creating the word contour

Preprocess: Binarization Local threshold using Sauvola's method T = 1 k 1 R where as μ is the mean and σ is the standard deviation of a local area centered by the pixel (k = 0.02 and R = 128)

Preprocess: Binarization Other possibility using morphological operators Dynamically thresholding is done on an opened image, where as the parameter are estimated on the closed image week strokes and connections between letters are detected and dilated.

Preprocessing: Position estimation Position estimation is important to connect disconnected parts later Interested part is the main body of the lower letters

Preprocessing: Connecting Connecting the components is done in 2 steps Labeling 8-neighborhood connectivity ignoring the components with β = 0.1 2 x height P A c This will remove punctuations and diacritic signs, therefore not usable for languages with lots of diacritic signs

Preprocessing: Connecting Connecting components (done by adding synthetic lines, since most disconnections occur between letters) Sorting the components by the horizontal position of center of gravity generating possible links between the closest pixels of the two components choosing the best valid link.

Preprocessing: Connecting Valid links are: Both ends of the link are strictly inside the main body of the lowercase letters. They must meet following condition: y 1 y y u c x height where α c is set empirically to 0.2 Both ends of the link are considerably above the main body of lowercase letters. Therefore it has to meet following condition: y 2y u y 1

Contour Matching Following techniques are proposed: MCC (multiscale convexity concavity) MCC-DCT (MCC using concrete cosine transf.) Chaincode Feature based methods

Contour Matching: MCC MCC stores the amount of convexity/concavity on different scale levels for each point. Representation as Matrix M(σ,u), where as σ is the scale and u is the contour point Concave and convex parts are distinguished by the sign.

Contour Matching: MCC The evaluation process generates the matrix M. Running on the curve C with N contour points parameterized by arc length u: C(U) = ( x(u), y(u) ), where u [0,N]. convolution of Contour C with Gaussian kernel Φ σ of width σ {1,2... σ max }. x u = x t u t dt y u = y t u t dt u = 1 u 2 2 e 2 2 2

Contour Matching: MCC

Matching Process Generation of a distance table size N x N each cell j,i stores the distance between the point i of the first contour and point j of the second contour f σu is the matrix for one contour (σ max x N) P σ is a weight to control the relative proportions between distances computing different scales max f d u A, u B = A u P A f B u B =1 r A B r r =max u { f u } min u { f u }

Matching Process

Other approaches Contour matching with MCC-DCT MCC-DCT uses the 1D cosine transformation Closely related to the MCC method The adaptation of the weights P σ is much simpler Other holistic approaches Feature based (Scalar and profile features)

Conclusions Advantages The proposed method can be used for other problems with multishaped objects easier to match handwritten words than single characters Disadvantages large data base needed for words not optimal approach for languages using lots of diacritic signs.

Questions? Thank you for your interest.