Usage of continuous skeletal image representation for document images dewarping.

Similar documents
A Binarization Algorithm specialized on Document Images and Photos

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

S1 Note. Basis functions.

Mathematics 256 a course in differential equations for engineering students

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Fast Computation of Shortest Path for Visiting Segments in the Plane

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

On Some Entertaining Applications of the Concept of Set in Computer Science Course

A New Approach For the Ranking of Fuzzy Sets With Different Heights

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

An Optimal Algorithm for Prufer Codes *

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

TN348: Openlab Module - Colocalization

3D vector computer graphics

Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Programming in Fortran 90 : 2017/2018

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Problem Set 3 Solutions

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

CS 534: Computer Vision Model Fitting

Lecture 5: Multilayer Perceptrons

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Active Contours/Snakes

Snakes-based approach for extraction of building roof contours from digital aerial images

Parallel matrix-vector multiplication

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

A Robust Method for Estimating the Fundamental Matrix

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

COMPLETE CALCULATION OF DISCONNECTION PROBABILITY IN PLANAR GRAPHS. G. Tsitsiashvili. IAM, FEB RAS, Vladivostok, Russia s:

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Edge Detection in Noisy Images Using the Support Vector Machines

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Visual Curvature. 1. Introduction. y C. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2007

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Available online at Available online at Advanced in Control Engineering and Information Science

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Cluster Analysis of Electrical Behavior

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Feature Reduction and Selection

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Simplification of 3D Meshes

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Machine Learning: Algorithms and Applications

Module Management Tool in Software Development Organizations

The Codesign Challenge

UNIT 2 : INEQUALITIES AND CONVEX SETS

Reading. 14. Subdivision curves. Recommended:

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

LECTURE : MANIFOLD LEARNING

Optimal Workload-based Weighted Wavelet Synopses

Smoothing Spline ANOVA for variable screening

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Solving two-person zero-sum game by Matlab

Multiblock method for database generation in finite element programs

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Research and Application of Fingerprint Recognition Based on MATLAB

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Support Vector Machines

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task


A Deflected Grid-based Algorithm for Clustering Analysis

GSLM Operations Research II Fall 13/14

A Newton-Type Method for Constrained Least-Squares Data-Fitting with Easy-to-Control Rational Curves

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

ESTIMATION OF PROPER PARAMETER VALUES FOR DOCUMENT BINARIZATION

Interpolation of the Irregular Curve Network of Ship Hull Form Using Subdivision Surfaces

Lecture Note 08 EECS 4101/5101 Instructor: Andy Mirzaian. All Nearest Neighbors: The Lifting Method

Detection of an Object by using Principal Component Analysis

Constructing Minimum Connected Dominating Set: Algorithmic approach

Contours Planning and Visual Servo Control of XXY Positioning System Using NURBS Interpolation Approach

Analysis of Continuous Beams in General

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

MOTION BLUR ESTIMATION AT CORNERS

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

An Image Compression Algorithm based on Wavelet Transform and LZW

CSE 326: Data Structures Quicksort Comparison Sorting Bound

An efficient method to build panoramic image mosaics

Transcription:

Usage of contnuous skeletal mage representaton for document mages dewarpng. Anton Masalovtch, Leond Mestetsky Moscow State Unversty, Moscow, Russa anton_m@abbyy.com, l.mest@ru.net Abstract In ths paper applcaton of contnuous skeletal mage representaton to documents mage de-warpng s descrbed. A novel technque s presented that allows to approxmate deformaton of nterlnear spaces of mage based on elements of mage s skeleton that le between the text lnes. A method for approxmaton of whole mage deformaton as combnaton of sngle nterlnear spaces deformatons s proposed and representaton of t n the form of 2- dmensonal cubc Bezer patch s suggested. Expermental results for batch of deformed document mages are gven that compare recognton qualty of mages before and after de-warpng process. These results prove effcency of the proposed algorthm. 1. Introducton All the modern OCR systems assume that text lnes n a document are straght and horzontal whle n real mages they are not. Image can be deformed before recognton n varous ways. For example, f a thck book s scanned, text lnes on the scan may be wrapped near the spne of book. If a dgtal camera s used to retreve the mage nstead of a scanner, the text lnes may be stll wrapped because of low-qualty optcs of dgtal cameras. One mportant example of such deformaton s the roundng of an mage on borders as result of barrel dstorton. Moreover, several types of deformaton could be appled to the same mage, makng t mpossble to buld a precse model of mage deformaton. Ths s how the task of mage de-warpng appears. The approach proposed n ths paper s based on the constructon of outer skeletons of text mages. The man dea of the proposed algorthm s based on the fact that t s easy to mark up long contnuous branches that defne nterlnear spaces of the document n outer skeletons. We approxmate such branches by cubc Bezer curves to fnd a specfc deformaton model of each nterlnear space of the document. On the bass of a set of such nterlnear spaces approxmatons, the whole approxmaton of the document s bult n the form of a 2-dmensonal cubc Bezer patch. After all ths work s completed, we can de-warp an mage usng obtaned approxmaton of mage deformaton. Ths work s an extenson of the artcle [1]. In ths paper new method of automatc search for nterlnear branches of skeleton s descrbed. Also teraton method of mage deformaton approxmaton adjustment s gven. To test our algorthm we compare recognton results for a batch of mages before and after the dewarpng process. 2. Exstng solutons Algorthm of automatc mage de-warpng s needed nowadays for automatc OCR systems. Plenty of algorthms for mage deformaton approxmaton appeared n the last several years (see for example [7-11]). Unfortunately, most of these algorthms have some dsadvantages that make them unusable for commercal OCR systems. Exstng solutons can be dvded to three approaches: Frst approach s to sngle out text lnes by combnng close black objects and then approxmatng each lne shape usng some characterstc ponts of lne s black objects. For example, one can approxmate text lnes shape by usng mddle ponts of black objects boundng rectangles. Man dsadvantage of ths approach s that t s hard to defne such characterstc ponts of black objects that can gve a stable approxmaton of lne shape. Second approach s to buld a model of possble deformaton of an mage and then try to apply ths model for a specfc mage. Man dsadvantage of ths method s that t s almost mpossble to buld a complete model of mage deformaton. And f such a model descrbes only one type of deformaton, one 45

should make sure that the used model can be appled for processng the concrete mage. Fnally, the thrd approach s to descrbe some estmaton of text lnes straghtness and teratvely deform mage to acheve a maxmum possble straghtness of text lnes. Man dsadvantage of ths method s that t uses numercal computng, and therefore s tme-consumng, whle the results of the method are often unpredctable. In our work we try to avod descrbed dsadvantages. So our goal s to create mage dewarpng algorthm that does not depend on text symbols qualty, s applcable to most of possble optc mage deformatons, wth predctable results and not tme-consumng. 3. Characterstcs of mages under consderaton It s necessary to descrbe some characterstcs of mages that our algorthm works wth: Intal mage should be black and whte, wth black text and whte background. It should not contan nverted areas. It also should not contan nose or textures. In all modern OCR systems effcent add-ons exst that allow brngng almost every mage to the marked model. And appled bnarzaton and nose removal technque may be very rough because our algorthm does not depend on text symbols qualty. Intal mage should contan one bg text block. Ths s an mportant assumpton, because the proposed algorthm works wth nterlnear spaces rather than wth text lnes, and therefore ntal mage must contan a suffcent number of long text lnes located one under another. All modern OCR systems can dvde ntal mage nto a set of text blocks wth hgh precson, even when the mages are deformed. Let us also assume that the deformaton of text lnes n an mage can be approxmated by contnuous cubc curves and patches. Ths assumpton s also not very restrctve, snce most common deformatons of mages are created by cameras and scanners. Such deformatons can be approxmated even by quadratc patches and curves. As for more complcated cases, experments have shown that cubc approxmaton s precse enough for them. In the case f addtonal experments wll show that cubc approxmaton s not suffcent after all, the degree of Bezer curves and patches can be easly ncreased wthout makng consderable modfcatons to the proposed algorthm. One example of an mage wth whch our algorthm works s represented on fgure 1. Fgure 1. Processng mage example 4. Problem defnton Let us assume that we have mage I x, y, where I s the color of mage pxel wth coordnates x, y. Let us also assume that ths mage contans text block wth deformed lnes. We further assume that we can rearrange pxels n ths mage wthout changng ther colors to retreve document mage where ntal lnes become straght and horzontal. So, we want to develop a contnuous vector functon D x, y to obtan a de-warped mage n the form: I x, y I D x x, y; D y x, y. Ths functon D x, y wll be an approxmaton of the whole mage deformaton. To estmate the qualty of our de-warpng algorthm, we attempt to recognze the mage before and after de-warpng usng one of the modern OCR systems. Recognton qualty n modern OCR systems depends heavly on the straghtness of text lnes n mages under consderaton. Therefore, an mprovement n recognton qualty after mage dewarpng s a good evaluaton of the qualty of our dewarpng algorthm. 5. Contnuous border representaton of bnary mage In ths work skeleton of polygonal fgures s exploted. Before usng such skeleton wth bnary mages we must defne representaton of dscrete bnary mage as a set of contnuous polygonal fgures. Let us assume that a scanned document s stored n the form of a bnary mage represented as a Boolean matrx (one bt per pxel). A dscrete model of the bnary mage s the nteger lattce I n the Eucldean 2 plane R wth 0 and 1 representng black and whte elements. For elements of the lattce the relaton of the 4-adjacent neghborhood s gven. We desgnate B I as the set of black and W I as the set of ( B, W ) serve as a whte nodes of the lattce. Sets 46

model of the dscrete bnary mage. In the same 2 Eucldean plane R, we defne the polygonal fgure as the set of the ponts formed by assocaton of a fnte number of non-overlappng bounded closed domans. Ths fgure s then a model of the contnuous bnary mage. There s a problem conssts n the constructon of the fgure that adequately descrbes propertes of the dscrete mage B. In mathematcal terms ths problem s posed as an approxmaton of a dscrete object wth a contnuous object. Natural crtera of good approxmatons should satsfy the followng natural crtera: 2 1) B, W R \, where means closure of a set; 2) Let be a par of adjacent nodes of the x, y I s xy y lattce and be a segment connectng these nodes. Then f x,, then s xy, and f x, y then s. xy The frst condton means that the fgure covers all black ponts of a dscrete mage and all whte ponts le ether outsde of or on the boundary of the fgure. The second condton can be reduced to the condton that the boundary of les n the nterface between whte and black boundary ponts of the dscrete mage. Let M be the set of all fgures satsfyng condtons 1 and 2. Any of them can be consdered a contnuous model of a bnary mage wth acceptable accuracy. As we are gong to buld a skeleton of ths fgure, the most convenent representaton for us s the fgure wth a pecewse lnear boundary, snce for such fgures there are effectve algorthms for constructon of a skeleton. In ths stuaton t s natural to choose from M a polygonal fgure (PF) wth mnmal permeter (see fg. 2). Frst, such PF exsts and t s unque. Second, the number of ts vertces s close to mnmal among all PF satsfyng condtons 1 and 2. Fgure 2. Representaton of raster object wth polygonal fgure wth mnmal permeter The algorthm for solvng ths problem whch requres a sngle pass over a raster mage, has been descrbed n [4]. 6. Contnuous skeletal representaton of an mage The choce of the polygonal fgure as a contnuous model of the bnary mage reduces the problem of constructon of a skeleton of the mage to the wellknown medal axs transform [5]. Contrary to dscrete mages for whch the skeleton s determned ambguously, the concept of a skeleton of a contnuous fgure has a strct mathematcal formulaton. The skeleton of a fgure s the locus of ponts of centers of maxmal empty crcles. An empty crcle does not contan any boundary ponts of the fgure. The maxmal empty crcle s a crcle whch s not contaned n any other empty crcle, and whch s not congruent to another. Note that empty crcles can be thus ether nternal or external for the domans comprsng the fgure. Accordngly ther centers form nternal and external skeletons of the fgure (see fg. 3). Fgure 3. Empty crcles for polygonal fgure and skeleton of polygonal fgure. Ths defnton apples to any type of shape, not just a polygon. However there exst effectve algorthms for constructon of polygonal fgures [4,6]. The algorthm used [2,3] s based on a generalzaton of Delauney trangulaton for a system of stes of two types (ponts and segments) that comprse a PF boundary. It bulds a skeleton n tme O(n log n) where n s the number of PF vertces. Skeleton of polygonal fgure can represented as a planar graph, where nodes are ponts on a plane and bones are straght lnes that connect the nodes. In such representaton of a skeleton all nodes have no less than three nearest ponts on the border of the area and all bones le between two lnear fragments of the area border. Later n ths artcle we wll use only graph representaton of a skeleton. Let us also defne a knot n skeleton as a node wth more then two connected bones and fnal node as a node wth only one connected node. And let us defne a branch of skeleton as a consstent set of bones that has fnal node or knot node on each end and does not have knots n the mddle of the branch. Later n ths artcle we wll operate only wth branches of the skeleton and not wth sngle bones. 47

7. Man dea of the algorthm Man dea of the proposed algorthm s that n outer skeleton of text document mage, one can easly fnd branches that le between adjacent text lnes. Then, one can use ths separaton branches to approxmate deformaton of nterlnear spaces n an mage. The proposed algorthm conssts of the followng steps: Contnuous skeletal representaton of an mage s bult. Skeleton s fltered (useless bones are deleted). Long near-horzontal branches of the skeleton are sngled out. Lst of sngled out branches s fltered to leave only branches that le between dfferent text lnes. Cubc Bezer approxmaton s bult for each branch. Bezer patch s bult based on the obtaned curves. 8. Image and skeleton preprocessng As was mentoned before, one of the steps of our algorthm s the preprocessng step, on whch we try to delete all small garbage branches and branches that can be obvously determned as non-nterlnear from the skeleton. Let us descrbe ths step n more detal. Frst of all, before buldng a skeleton, we flood all whte horzontal strokes wth length smaller than some predefned threshold. By dong so, we glue symbols n words n one lne, so we erase from mage skeleton a lot f ntersymbol branches that are useless for our algorthm. We set the value of the floodng parameter equal to 0.1 nches or 30 pxels for 300 dp mages (ths value determned emprcally). That value s suffcent to glue most adjacent symbols and not to glue adjacent curved lnes. Then we buld outer skeleton of the expanded mage. The next step s to delete branches of the skeleton that dvde dfferent parts of the same object. Such branches descrbe borders of one symbol and are not relevant for the whole text lne. We also delete branches of skeleton that dvde objects n mage and border of an mage. Fgure 4 shows an example of such mage preprocessng. Fgure 4. Image skeleton after preprocessng 9. Skeleton bones clusterzaton After outer skeleton of a document mage was bult, we could dvde branches of the skeleton nto two groups: branches that le between objects n one text lne and branches that le between adjacent text lnes. The man dea of the proposed algorthm s that such clusterzaton can be performed automatcally for any document mage. Frst we sort out all skeleton branches that are shorter then some predefned threshold. Such branches appear when several long branches connected n one pont. Such short branches works only for connectvty propose, the angle of such branches s unpredctable, so they are not used durng clusterzaton process (see fg. 5). Fgure 5. Short branch that connects several long brances. As a threshold value for short branches we use emprcal value of 0.05 nches or 15 pxels for 300 dp mages (determned emprcally). It s about half of small letters heght for standard font sze, so we don t treat any of ntersymbol branches as short. To clusterze long branches we defne parameter A max - maxmal absolute value of angle of nterlnear branch (as angle of skeleton branch we use angle of lnear approxmaton of that branch). Experments show that t s possble for each mage skeleton to defne ths parameter n such a way that all long vertcal branches wth angle > A max wll be only ntersymbol branches. 48

Ths dea can be confrmed by graphc representaton, f we draw all lnear approxmaton of skeleton branches on one plane, so that they all begn n one pont. For a document mage the obtaned fgure wll look lke a cross, the horzontal part of whch s created by nterlnear branches, whle the vertcal part s created by ntersymbol branches (see fg. 6). Fgure 8. Skeleton of document mage after clusterzaton of branches. 10. Buldng nterlnear branches. Fgure 6. Branches of skeleton from fgure 4 marked on one plane. A max To defne parameters we use smple automatc clusterzaton mechansm. Each possble value of angle dvdes all branches nto two classes wth angle greater and less then the gven threshold. For each class we defne as the mean value of the angle n ths class and as the standard devaton of the angle from the mean value. Usng these two values we can defne separaton factor of two classes J t n the form: R L J t. R L Then we terate among the angles, lookng for that wth the mnmum separaton factor, usng one degree as the sze of the step (see fg. 7). After all vertcal branches are deleted, the remanng branches are processed n cycle accordng to the followng rules: If two nodes are connected by two nonntersected branches (such a problem appears when text language ncludes dacrtcs and addtonal branch goes between dacrtc and symbol (see fg. 9)), we delete most curved branch of these two. Fgure 9. Two skeleton branches around a dacrtc mark. If three branches are connected n one pont (such a problem appears because some short branches reman after all vertcal branches were deleted (see fg. 10)), we delete the shortest of the these branches. Fgure 10. Remanng of vertcal branches. Fgure 7. Hstogramm of branches angles from fgure 6 wth detected threshold After clusterzaton we delete all long vertcal branches from the skeleton (see fg. 8). If two long horzontal branches are connected near the border of an mage (such a problem appears when two nterlne branches merge together outsde the borders of a text block (see fg. 11)), we separate connecton node of these branches nto two ndependent nodes. 49

b r 3 t, - cubc Bernsten polynomal. 12. Bezer patch adjustment Fgure 11. Two branches connected on the end of text lne. After all these rules were appled, only long horzontal branches that le between adjacent text lnes reman n the skeleton. We approxmate them wth cubc Bezer curves usng method of least-square approxmaton. 11. Approxmaton of mage deformaton After we get approxmaton of each nterlnear space n the mage, we must approxmate deformaton of whole mage. Defne control ponts of nterlnear curves as I k, where k s the ndex of a curve and s the ndex of a control pont on ths curve. For each set of ponts n I (control ponts from k k 0 all nterlne curves wth same ndex) we buld approxmaton wth vertcal Bezer curve. Let us defne control ponts of obtaned curves as P j, where s the ndex of ntal control ponts and j s the ndex of new control ponts on created curve (see fg. 12). Fgure 12. Defnton of control ponts of Bezer patch After we get the set of ponts, we can buld whole mage deformaton usng Bezer patch. In other words, our approxmaton may be descrbed by the followng formula: 3 3 P j, * * D x y P b x b y 0 j0 j,3 j,3 Unfortunately, when we approxmate nterlne spaces we cannot defne clearly where each text lne begns. Because of ths, vertcal ponts of the patch mght be very randomly curved. To avod such an effect we use the followng adjustment procedure: For each nterlne curve 3 C x Aj* bj,3 x we search for nearest j0 curve n Bezer patch. Defne obtaned curve as 3 3 C ( x) D x, y P * b x * b y j,3 j,3 0 j0. Defne and as parameters of ponts on the C nearest to begn and end ponts of curve C. arg mn C t, C 0 t arg mn C t, C 1 t Then we buld curve C that dentcal to C, but curve dffers n parameterzaton (has shfted parameters), so that C C0 and C C1. In other words, C t t C t 3 3 Aj* bj,3 Bj* j,3 j0 j0 b t Then we calculate mean devaton d between curves C and C. If ths devaton s greater than some predefned threshold, the orgnal curve C must be excluded from patch creaton, otherwse orgnal curve C must be replaced wth C. After the processng of all ntal curves s completed we buld a new Bezer patch usng updated set of curves. We repeat ths procedure untl devatons of all ntal curves from curves from Bezer patch reach some predefned threshold. Ths adjustment procedure allows to approxmate vertcal borders of text block and mproves deformaton approxmaton of whole page because of excluson of erroneously created curves (see fg. 13). 50

Fgure 13. Image deformaton approxmaton before and after Bezer patch adjustment. Fgure 14. Intal deformed mage 13. Expermental results To test effcency of our algorthm we take a set of 31 mages. All mages from ths set satsfy the condtons descrbed n secton 3 they are black-andwhte mages wthout nose, whch contan one bg text blocks wth deformed lnes. We recognze all these mages wth one modern OCR system before and after the de-warpng process. For deformed mages there were 2721 recognton errors on all pages (4.92% of all). For de-warped mages there were 830 recognton errors on all pages (1.50% of all symbols). Therefore, after the dewarpng process 1891 errors were corrected (69.5% of orgnal errors). In addton, 14 lnes were not found on ntal mages, because of ther hgh deformaton, and after de-warpng all text lnes were defned correctly. The attaned results show hgh effcency of the proposed algorthm, but ts qualty s not maxmal yet. Recognton qualty for straght mages s hgher than 99,5% n modern OCR systems. And for de-warped mages we obtan the qualty of only 98,5%. The man reason for ths gap s that our algorthm deforms symbols a lttle durng de-warpng and that n turn causes errors n symbols recognton. On fgures 14-16 an example of mage de-warpng for one of the mages from our test set s gven. Also our algorthm was tested durng Document Image De-warpng Contest that was held n CBDAR 2007 [12]. On the contest de-warpng algorthms were appled to test base of 100 mages (test set avalable for download here - http://www.upr.org/downloads/data). Experments shown that mean edt dstance for mages de-warped by our algorthm was less then 1% on contest data set. Those results are statstcally the same for the other two partcpants of the contest. And on quarter of test mages our algorthm shown lowest edt dstance. Fgure 15. Image deformaton approxmated wth Bezer patch. 14. Future works Fgure 16. De-warped mage. The man drecton of feature work s to develop a better de-warpng algorthm based on obtaned mage deformaton approxmaton. De-warpng algorthm that we use s very naïve, whch heads to some addtonal recognton mstakes on the de-warped mages. 51

More accurate approxmaton of deformaton of vertcal borders of text blocks s also one of our pror tasks. 15. Concluson Ths artcle descrbes a novel technque for approxmaton of text document mage deformaton based on contnuous skeletal representaton of an mage. In our work we try to avod man dsadvantages of exstng de-warpng solutons: use of separate symbol characterstcs, use of specfc deformaton model, use of unpredctable numercal methods. Man advantage of proposed algorthm s that t does not rely on qualty of the ntal text. Intal characters can be broken, flooded or erroneously bnarzed proposed algorthm does not depend on t. Ths paper descrbes all man steps of the proposed algorthm: constructon of skeletal representaton of an mage, preprocessng of mage s skeleton, detecton of nterlnear branches of the skeleton, approxmaton of such branches, fnal approxmaton of mage deformaton. Based on the proposed algorthm a prototype of fully automatc system of mage de-warpng was bult. Expermental results that prove effcency of the proposed algorthm and ts mportance for recognton of deformed mages are gven. Acknowledgments Ths research was supported by a grant from Russan foundaton for basc research (grant 05-01- 00542). Bblography [1] A.A. Masalovtch, L.M. Mestetsky, Document Image Deformaton Approxmated by the Means of Contnuous Skeletal Representaton of the Image, Proceedngs of nternatonal conference PRIP (Pattern Recognton and Informaton Processng), 2007, pp. 279-284. [2] L.M. Mestetsky, Skeletonzaton of polygonal fgures based on the generalzed Delaunay trangulaton, Programmng and computer software, 25(3), 1999, pp. 131-142. [3] L.M. Mestetsky, Skeleton of multply connected polygonal fgure, Proceedngs of nternatonal conference Graphcon, 2005. [4] S. Fortune, A sweeplne algorthm for Vorono dagrams, Algorthmca, 2, 1987, pp. 153-174. [5] D.T. Lee, Medal axes transform of planar shape, IEEE Trans. Patt. Anal. Mach. Intell. PAMI-4, 1982, pp.363-369. [6] C.K. Yap, An O(n log n) algorthm for the Vorono dagram of the set of smple curve segments, Dscrete Comput. Geom., 2, 1987, pp.365-393. [7] Hronor Ezak, Sech Uchda, Akra Asano, Hroak Sakoe, Dewarpng of document mages by global optmzaton, Proceedngs of nternatonal conference ICDAR, 2005, pp. 302-306. [8] Udran Ulges, Chrstoph H. Lampert, Thomas M. Breuel, A Fast and Stable Approach for Restoraton of Warped Document Images, Proceedngs of nternatonal conference ICDAR, 2005, pp. 384-388. [9] L Zhang, Chew Lm Tan, Warped Image Restoraton wth Applcaton to Dgtal Lbrares, Proceedngs of nternatonal conference ICDAR, 2005, pp. 192-196. [10] A. Yamashta, A. Kawarago, T. Kaneko, K.T. Mura, Shape reconstructon and mage restoraton for non-flat surfaces of documents wth a stereo vson system, Proceedngs of nternatonal conference ICPR, 2004, pp 482-485. [11] M.S. Brown, W.B. Seales, Image Restoraton of Arbtrarly Warped Documents. IEEE Transactons on Pattern Analyss and Machne Intellgence, Volume 26, Issue 10, 2004, pp. 1295-1306. [12] Fasal Shafat, Thomas M. Breuel, Document Image Dewarpng Contest. Proceedngs of 2nd Int. Workshop on Camera-Based Document Analyss and Recognton, Curtba, Brazl, Sep. 2007. 52