Editorial Manager(tm) for International Journal of Pattern Recognition and

Artfcal Intellgence Edtoral Manager(tm) for Internatonal Journal of Pattern Recognton and Manuscrpt Draft Manuscrpt Number: Ttle: TEXT LOCALIZATION IN COMPLEX COLOR DOCUMENTS Artcle Type: Research Paper Secton/Category: Prof. X Jang Keywords: Color Reducton; Document Processng; Text Localzaton; Text Informaton Extracton; Page Layout Analyss. Correspondng Author: Professor Nkos Papamarkos, Correspondng Author's Insttuton: Frst Author: Nkos Nkolaou Order of Authors: Nkos Nkolaou; Efthmos Badekas; Nkos Papamarkos; Charalambos Strouthopoulos Abstract: ABSTRACT In ths paper, a new method for text localzaton n complex color document mages s presented. Frst, the colors of the document mage are reduced to a small number usng a proper color reducton technque. Each color defnes a color plane n whch the connected components are extracted. In each color plane, a connected component labelng and flterng procedure s appled whch s followed by a local groupng procedure. At the end of ths stage, groups of connected components are constructed whch are next refned by obtanng the Drecton of Connecton (DOC) property for each connected component. Usng the DOC property, the connected components groups are classfed as text or non-text regons. Fnally, text regons dentfed n all color planes

are supermposed and the fnal text localzaton result s obtaned. The proposed technque was extensvely tested wth varous types of complex color documents. Suggested Revewers: Basls Gatos Assocate Professor Insttute of Informatcs and Telecommuncatons of the Natonal Center for Scentfc Research "Demokrtos", Athens, Greece bgat@t.demokrtos.gr Apostolos Antonacopoulos Senor Lecture School of Computng, Scence and Engneerng, Unversty of Salford, UK, School of Computng, Scence and Engneerng, Unversty of Salford, UK A.Antonacopoulos@prmaresearch.org Mchael Zervaks Professor Electronc and Computer Engneerng, Department of Electroncs and Computer Engneerng Techncal Unversty of Crete, Greece mchals@systems.tuc.gr

Manuscrpt Clck here to download Manuscrpt: Nkolaou_Text_Localzaton.doc TEXT LOCALIZATION IN COMPLEX COLOR DOCUMENTS N. Nkolaou *, E. Badekas *, N. Papamarkos * and C. Strouthopoulos ** * Image Processng and Multmeda Laboratory Department of Electrcal & Computer Engneerng Democrtus Unversty of Thrace, 67100 Xanth, Greece papamark@ee.duth.gr http://pml.ee.duth.gr/~papamark/ ** Department of Informatcs and Communcatons Technologcal Educatonal Insttuton of Serres 62123 Serres, Greece, strch@teser.gr Malng Address: Professor Nkos Papamarkos Democrtus Unversty of Thrace Department of Electrcal & Computer Engneerng Image Processng and Multmeda Laboratory 67100 Xanth, Greece Telephone: +30-25410-79585 FAX: +30-25410-79569 Emal: papamark@ee.duth.gr 1

ABSTRACT In ths paper, a new method for text localzaton n complex color document mages s presented. Frst, the colors of the document mage are reduced to a small number usng a proper color reducton technque. Each color defnes a color plane n whch the connected components are extracted. In each color plane, a connected component labelng and flterng procedure s appled whch s followed by a local groupng procedure. At the end of ths stage, groups of connected components are constructed whch are next refned by obtanng the Drecton of Connecton (DOC) property for each connected component. Usng the DOC property, the connected components groups are classfed as text or non-text regons. Fnally, text regons dentfed n all color planes are supermposed and the fnal text localzaton result s obtaned. The proposed technque was extensvely tested wth varous types of complex color documents. Keywords: Color Reducton, Document Processng, Text Localzaton, Text Informaton Extracton, Page Layout Analyss 2

1. INTRODUCTION Interest about explotng text nformaton n mages and vdeo has grown notably durng the past years. The ablty of text to provde powerful descrpton of the mage content, the convenence of dstngushng t from other mage features and the provson of extremely mportant nformaton, reasonably attracts the research nterest. Content-based mage retreval, OCR, page segmentaton, lcense plate locaton, address block locaton and compresson are some example applcatons based on text nformaton extracton from varous types of mages. An mportant procedure n text nformaton extracton systems s the localzaton of text regons. A man categorzaton of text localzaton methods nclude texture based technques [1][2][3][4][5][6] and connected components (CCs) based technques [7] [8][9][10][11][12]. Some hybrd approaches have also been reported n the lterature [13][14] [15]. Texture based methods use the observaton that text n mages has dstnct textural propertes that dstngush them from the background. They are based on the use of Gabor flters, Wavelets, FFT, spatal varance, edge nformaton etc. The man drawbacks of texture based technques are that they are tme consumng and they use character sze restrctons. Ther man advantage aganst the connected components based technques s the capablty of detectng text n low resoluton mages and for ths reason they are manly used n vdeo text based applcatons [16][17][18] [19][20][21]. On the other hand, CCs based technques are fast and relatvely smple n mplementaton and explot the fact that characters are segmented. Our approach belongs to ths specfc category of text nformaton extracton technques. Most approaches for text localzaton refer to gray [22] or bnary [23] document mages. Only recently, some technques have been proposed for text localzaton and extracton n color documents. Strouthopoulos et al. [7] proposed a method for text extracton n complex color documents. It s based on a combnaton of a color reducton technque and a page layout analyss approach whch uses a neural network block classfer n order to dentfy text blocks. Chen and Chen [8] proposed a method for text block localzaton on color techncal journals cover mages. Intally, the colors of the mage are reduced usng a YIQ color space based algorthm. Wth the Sobel operator and through a 3

bnarzaton process, "strong" edges are solated. Prmary blocks are then detected wth the Run Length Smearng Algorthm and fnally classfed wth the use of nne features whch underle on fuzzy rules. Sobottka et al. [9] proposed an approach to extract text from colored books and journal covers. The mage s quantzed wth an unsupervsed clusterng method and the text regons are then dentfed combnng a top-down and a bottom-up technque. An algorthm for character strng extracton from color documents s presented by Hase et al. [10]. Frst the number of representatve colors of a document s determned. Potental character strngs are then extracted from each color plane usng a mult-stage relaxaton approach. When all extracted elements are supermposed, a strategy whch utlzes the lkelhood of a character strng and a conflct resoluton s followed to produce the fnal result. Zhong et al. [13] presented a hybrd system for text localzaton n complex color mages. Accordng to ths system, a color segmentaton stage s performed by dentfyng local maxma n the color hstogram. Heurstc flters on the CCs of the same color plane are appled and non-character components are removed. A second approach for text lnes localzaton based on local spatal varance s also proposed. In the work of Jung [14], a hybrd approach for text extracton n complex color mages s also presented. The proposed system uses a mult-layer perceptron to utlze the textural propertes of the color mages and ntally locate the coarse areas of text. In the next step a CCs based method removes the false alarms produced by the frst step and accordng to the mage type (document or vdeo) the text regons are dentfed. A smlar hybrd approach s proposed n [15] where text s located wth the use of Gabor flterng on the gray scale verson of the ntal color mage. Ths result s combned wth the CCs based text extracton n order to produce the fnal result. In the recent work of Lu et al. [24], text extracton s based on the statstcal modelng of neghborng characters whch represent the dstrbuton of a Gaussan Mxture Model. The drawback of ths technque s that the mage s frst converted to bnary. Cho [25] presents a specalzed technque where text s detected n lght dstorted color mages by elmnatng the reflectance component. Although the authors state that ther method can deal wth mages wth complex backgrounds, the mages n the examples have smple backgrounds. Also, n the technques presented n [26] and [27], text s extracted from color mages obtaned from the web. Ths s a 4

specal category of low resoluton mages where characters are usually small and therefore these technques are not sutable for extractng text from color documents. A detaled revew of varous text nformaton extracton technques appled on vdeo and document mages s presented by Jung et al. [28]. The rest of ths paper s organzed as follows. Secton 2 presents an overvew of our method. Secton 3 descrbes the color reducton technque and the color planes creaton n detal. In Secton 4, the connected component analyss and flterng procedure s presented. Sectons 5 and 6 descrbe the process that forms the groups of the connected components and ther classfcaton nto text and non text classes, respectvely. Fnally, we provde the evaluaton of the method and the expermental results n Secton 7. 2. OVERVIEW OF THE PROPOSED TECHNIQUE As already mentoned, n most of the cases, a text localzaton technque for complex color documents ncludes a color reducton stage. The performance of ths stage s crucal for the effectveness of the entre text localzaton technque. The goal of ths paper s to propose a new technque for text localzaton whch overcomes the dffcultes assocated wth mxed type color documents such as complex color cover pages. Specfcally, for ths type of color documents, text and graphcs are hghly mxed wth the background and even more, n many cases the background cannot be clearly defned. Three examples of complex color documents are depcted n Fg. 1. Fgure 1. Examples of complex color documents (cover books and magazne covers). The proposed technque effcently ntegrates a color reducton procedure and a color plane analyss technque. That s, n order to handle varyng colors of the text n 5

the mage, a combnaton strategy s mplemented among the bnary mages (we call them color planes) obtaned by the color reducton procedure. Specfcally, after color reducton, each color defnes a color plane n whch the connected components (CCs) are extracted and a connected component (CC) flterng procedure s appled whch s followed by a local groupng procedure. At the end of ths stage, groups of CCs are constructed whch are next refned by obtanng the Drecton of Connecton (DOC) property for each CC. Next, the groups of CCs are classfed as text or non-text regons based on ther DOC property. Fnally, text regons dentfed n all color planes are supermposed and the fnal text localzaton result s obtaned. The proposed text localzaton technque conssts of the followng man stages: Stage 1: Color reducton and color planes creaton Stage 2: Connected components labelng and flterng Stage 3: Intal component groupng Stage 4: Fnal component groupng Stage 5: Classfcaton of groups Stage 6: Color planes supermposton and fnal color text localzaton Stages 3-6 are appled ndependently on each color plane produced n stage 1. The groupng procedure of CCs nto homogenous sets (stages 3 and 4) s carred out n two phases. Frst we create connectons between CCs (component parng) by searchng for smlar objects n an adaptvely defned area. Ths wll lead to a general formaton of groups. Based on features resulted from the connectons, we assgn a specal property to each CC named Drecton Of Connecton (DOC) whch ndcates whether a CC s lkely to belong n an horzontal or a vertcal structure. Ths nformaton s used to remove false connectons between objects and perform the fnal formng of element groups. Classfcaton module labels the groups as text or non-text and fnally the results are adopted from the last stage whch supermpostons the detected text regons from all color planes. The flowchart of the proposed text localzaton technque s presented n Fg. 2. 6

Fgure 2. Flowchart of the proposed technque. The proposed technque performs satsfactory n the majorty of mxed type of color documents. However, t s preferable the document to satsfy the followng condtons that are satsfed by the majorty of modern book covers and generally color documents: The color of the characters should not be gradent Text orentaton s allowed to be horzontal and/or vertcal wth about 15 degrees of angle tolerance The resoluton of the document mages must be at least 100dp The proposed technque s mplemented n vsual envronment and t has been extensvely tested wth success wth a large number of color documents. 3. COLOR REDUCTION AND COLOR PLANES CREATION The purpose of the color reducton stage s to create a smplfed verson of the ntal color document mage from whch character elements can be extracted as connected components, n other words, to perform color document segmentaton. Ths s a very crucal stage because the text localzaton result depends to a large extend on the color segmentaton result. If t fals to produce homogenous text components, the task of text areas dentfcaton wll become extremely dffcult. For every color of the color reduced mage, a bnary mage s created whch s called a color plane. These bnary mages are processed ndependently through the text localzaton process and the results from each one are supermposed n order to 7

form the fnal text localzaton result. An example of color planes creaton s gven n Fg. 3. (a) (b) (c) (d) (e) Fgure 3. Color planes creaton demonstraton. (a) Orgnal color document, (b) color reduced mage (3 colors), (c) color plane 1, (d) color plane 2, (e) color plane 3. The color reducton technque adopted here s the work presented n [29][30]. It has the ablty to segment the color document wthout causng oversegmentaton of the characters or fuson wth the background. Addtonally, t merges low contrast non text objects wth ther background and creates large compact areas. Ths results to a smaller number of connected components and to the qualty mprovement of the document mage so that the text localzaton procedure performs better. The resulted number of colors n most cases s not greater than 10. 8

A bref descrpton of the man stages of the color reducton technque whch s used n the present work s gven next. Stage 1: Preprocessng - Edge preservng smoothng Intally, the color document mage s preprocessed by applyng an Edge Preservng Smoothng Flter (EPSF) n order to remove the nose and also to be able to deal wth textured backgrounds. The qualty of the document mage s sgnfcantly mproved and ths results to less classfcaton errors n the extracted mage. Stage 2: Color edge detecton & RGB color space sub-samplng In the second step, the RGB color dstrbuton of the mage s approxmated wth the samples of the mage whch correspond to the local mnma of the edge map. Ths ensures that the samples are not edge ponts, whch guarantees that they are located n the nteror regon of the objects. Therefore, fuzzy ponts on the transton areas between objects are avoded. Addtonally all objects colors are represented n the obtaned samplng set, regardless of ther sze. Stage 3: Intal color reducton In the next step, the method, based on the samples obtaned from the prevous step, reduces the colors to a relatvely large number (usually no more than 100). The resulted mage at ths stage s oversegmented. That s, the objects consst of at least one connected component. Stage 4: Mean-shft procedure In the last stage, a mean-shft operaton procedure [32][33][34] s appled on the color centers whch resulted from the prevous step and the fnal color centers n the RGB color space are extracted. The fnal number of colors s small and the fnal document mage obtaned has sold characters and unform local backgrounds. Fg. 4 presents a color reducton example of a complex color document wth textured background obtaned by the technque dscussed here. 9

(a) 61433 colors (b) 4 colors Fgure 4. Color reducton example adopted by the method descrbed n [29]. (a) Orgnal color document, (b) color reducton result 4. CONNECTED COMPONENTS LABELING AND FILTERING In each color plane, connected components (CCs) are dentfed and labeled. The enclosng rectangle of a connected component (CC) s defned as ts boundng box. Let CC be a connected component. Every CC s characterzed by the followng set of features: BB( CC ) { W, H}. The boundng box of CC. and H the heght. W represents the wdth Xl, Xr, Xc, Yl. The x and y coordnates of the top left pont of BB( CC ). Yr. The x and y coordnates of the bottom rght pont of BB( CC ). Yc. The x and y coordnates of the central pont of BB( CC ). psze( CC ). The number of pxels whch CC conssts of. bsze( CC ) W H. The sze of BB( CC ). dens( CC ) psze( CC ) / bsze( CC ). The densty (or saturaton) of CC. elong( CC ) mn{ W, H}/ max{ W, H}. The elongaton of CC. 10

4.1. Heurstc flterng Accordng to the prevously defned features, object f CC s consdered as a non-text psze( CC ) dens( CC ) T. T 8 pxels psze psze Tdens. Tdens 0.08. CC must cover no less than the 8% of the BB( CC ) elong( CC ) T. T 0.08. Ths means that the wdth W of a CC elong elong cannot be 12.5 tmes larger than H (and the opposte). These thresholds have been carefully selected after several tests n order not to reject character elements. Ths flterng procedure can be consdered as a preprocessng step and targets only on removng very nosy components resulted from the color reducton procedure. In addton, t speeds up the text localzaton procedure snce the number of CCs decreases sgnfcantly. 4.2. Incluson based flterng The objectve of ths type of flterng s the dentfcaton and removal of local backgrounds and graphc llustratons whch are very common n color document mages. In order to acheve ths, we propose a procedure based on the ncluson feature. For a connected component CC, the ncluson feature Inc( CC ) s defned as the number of connected components of the same color plane whose boundng boxes are fully ncluded n the boundng box of CC ( BB( CC ) ). For example a connected component CC s ncluded n CC, j j f ( Xl Xl ) ( Yl Yl ) ( Xr Xr ) ( Yr Yr ) (1) j j j j The above condton s graphcally depcted n Fg. 5 where BB( CC j ) s fully ncluded n BB( CC ). 11

Fgure 5. Incluson demonstraton. Local backgrounds and graphc llustratons tend to have many components fully ncluded n ther boundng boxes, especally n cases where characters are ncluded n them. Ths happens manly due to the character holes, as t happens for example n characters "a", "b", "d" etc. Contrary, t s not lkely for character elements to nclude other components of the same color n ther boundng boxes. Accordng to ths we state that f Inc( CC ) T (2) CC s consdered as a non-text component and t s removed from the document. In the example of Fg. 6, a secton of a document s depcted where the ncluson feature can be vsualzed. The boundng box of the background (black pxels) fully ncludes 7 boundng boxes of other connected components (character holes) of the same color. nc (a) Fgure 6. Local background dentfcaton example. (a) Color reduced document wth 2 colors, (b) boundng boxes of the black connected components. (b) 5. COMPONENT GROUPS CREATION In ths secton we present the methodology on whch the technque s based n order to organze the fltered connected components nto homogenous groups. It conssts of two steps, the ntal - draft groupng and the fnal groupng step where a specal property (Drecton of Connecton - DOC) s assgned to every connected 12

component. Ths property s used n the classfcaton stage where the formed groups wll be characterzed as text or non text component groups. 5.1. Intal groupng of CCs Let CC be a connected component. We defne a dynamc regon R( CC ) as the set of all pxels ( x, y ) satsfyng the followng condton (Fg. 7(a)): d d d (3) mn max where d the Eucldean dstance of pxel ( x, y ) from the central pont ( Xc, Yc ) of BB( CC ). Objects whose central pont of ther boundng box s located nsde R( CC ) are labeled and consdered as canddate objects for creatng a connecton (lnk) wth CC. dmn s a small constant value (usually 5 pxels) and the purpose of ts use s to avod the creaton of connectons between very small CCs. defned by the followng equaton: dmax dmax cd max{ W, H} (4) The sze of R( CC ) s dynamcally adapted by the sze of CC due to the fact that t depends on the maxmum value of the wdth and the heght of the boundng box BB( CC ). Addtonally, coeffcent cd also adjusts the sze of R( CC ) taken equal to 4, the resulted sze of R( CC ) regardng adjacent connected components of algorthm n order to specfy whether structure block. CC s and f t s contans a large amount of nformaton CC. Ths nformaton s used by the belongs to a horzontal or a to vertcal Connected components whose central pont s located nsde R( CC ) are lnked wth CC f a certan condton s satsfed. Ths condton s related to a dstance measure defned n the work of Smon et al. [31] and t s adopted n our work but t s used n a dfferent way. Let CC and CC j be two connected components. The Horzontal Block Dstance ( HBD(, j ) ) and the Vertcal Block Dstance ( VBD(, j ) ) between CC and CC j are defned as: 13

HBD(, j) max{ Xl, Xl } mn{ Xr, Xr } (5) j j VBD(, j) max{ Yl, Yl } mn{ Yr, Yr } (6) j j Equatons (5) and (6) are graphcally depcted n Fg. 8. (a) Fgure 7. Connected components connectons. (a) Defnton of R( CC ), (b) R( CC ) connectons between CCs are shown n a bnary document. (b) and example Fgure 8. Defnton of HBD(, j) and VBD(, j ). When HBD(, j) 0 then CC and CC j overlap n the vertcal drecton and when VBD(, j) 0, they overlap n the horzontal drecton (as n Fg. 8). In order to create a connecton between CC and CC j a certan amount of overlappng between the two components must exst n one of the two drectons, horzontal or vertcal. The followng relatons represent the condton that must be satsfed for the lnk to be establshed. 14

VBD(, j) T max{ H, H }, f max HBD(, j), VBD(, j) VBD(, j r j HBD(, j) T max{ W, W }, f max HBD(, j), VBD(, j) HBD(, j r j (7) (8) where Tr [0,1]. In words, the method consders the largest amount of overlappng between the horzontal and the vertcal drecton overlappng. If ths amount covers at least a certan percentage (controlled by T r ) of the heght or the wdth of the component, the connecton between the two components s created. Connectons between connected components are bdrectonal, that s, the connecton condtons must apply for CC towards CC j and the opposte. Ths can be seen clearly n Fg. 7(b) where character "y" connects wth character "s" but not "s" wth "y". (b) (a) Fgure 9. The ntal groupng procedure. (a) Connectons between connected components ( Tr 0.4 ), (b) a component wth 11 connectons, (c) hstogram of the number of connectons of (a) The result of the ntal groupng process s the creaton of connecton sets assocated to each CC. (c) 15

1 cn C( CC ) { c,..., c } (9) where cn s the number of connectons for CC. C( CC ) means that no match component wth CC was found n R( CC ). Any CC havng ths specfc property s consdered as an solated non-text component and t s excluded from the followng stages of the method. Thereby, further flterng of non-text objects s acheved. An example of the ntal groupng procedure s shown n Fg. 9. In most cases, characters are assgned wth at least four connectons as shown n the hstogram of Fg. 9(c). Ths helps n gatherng more nformaton about adjacent CCs than takng nto account only the closest neghbors. 5.2. Fnal component groupng Based on the results of the ntal groupng stage, the method contnues wth the characterzaton of the CCs wth a property named Drecton Of Connecton (DOC). The purpose of ths strategy s to refne the groups of components that the prevous procedure created and addtonally to supply the classfcaton module wth the nformaton on whch t wll be based to fnally extract the text blocks. The refned groups wll be homogenous sets, that s text and non-text component groups wll be spatally dscrmnated. To defne the DOC property, the followng two metrcs are ntroduced: cn H ( CC ) VBD(, j) (10) o j1 cn V ( CC ) HBD(, j) (11) o j1 H ( CC ) and V ( CC ) o o measure the total amount of overlappng of the connected component CC wth the components whch s connected wth, n the horzontal and the vertcal drecton, respectvely. The DOC property s defned as: 1, f ( H T V ) H T H DOC( CC ) 2, f ( Vo To Ho) Vo T W 0, othewse o o o o (12) 16

DOC( CC ) 1 ndcates that CC belongs to an horzontal structure and DOC( CC ) 2 that t belongs to a vertcal structure. Threshold To controls the mnmum overlappng amount dfference between the horzontal and vertcal drecton that a CC must have n order to be characterzed. When a component belongs to a horzontal structure of components, the horzontal overlappng amount s expected to be much larger than the vertcal overlappng amount. For example, n Fg. 9(b), the "S" character has 11 connectons from whch 9 of them are located n the horzontal drecton of the document. The same assumpton apples for the vertcal structure case. Also, the overlappng amount must be at least T tmes the heght component for the horzontal case and T tmes the wdth W for the vertcal case. H of the For a text wth skew of about 15 degrees dstorton (n ether horzontal or vertcal drecton) a sgnfcant amount of overlappng between CCs remans and thus these blocks can also be characterzed. (a) Fgure 10. Components groups creaton example. (a) Intal groupng, (b) fnal groupng. Dependng on the results of the DOC property assgnment procedure, the method removes all nvald connectons of CCs amng to the fnal blocks formaton. Specfcally, n the case where DOC( CC ) 1 and VBD(, j) 0, CC preserve a connecton wth them. In other words, overlappng exsts wth CC CC j (b) may not because no horzontal overlappng exsts between belongs to a horzontal structure but no horzontal CC j, a fact whch s a contradcton. The same rule apples when DOC( CC ) 2 and HBD(, j) 0. Followng ths procedure, textual 17

component groups are spatally dscrmnated from non-textual groups and the fnal classfcaton stage can be appled. Also, text s now formed n the sense of text lnes. Fg. 10 demonstrates the ntal and the fnal component groupng procedure n a bnary document where horzontal and vertcal text coexsts. As t can be observed, components are grouped n the sense of text lnes and addtonally text wth dfferent orentaton s dscrmnated. 6. CLASSIFICATION OF GROUPS In ths fnal stage, a classfcaton procedure s appled whch classfes the formed connected component groups nto two classes, the textual and the non-textual class. Due to the fact that text components are very lkely to be assgned wth DOC values 1 or 2, a statstcal metrc s used to reflect ths. Let N CCs, and B j be a structure block contanng BH { CC B : DOC( CC ) 0, 1,..., k} (13) j j represents the subset contanng all the CC of the structure block B j that have DOC( CC ) 0. The entre structure block s consdered to be an horzontal text block f k Tp N (14) where Tp [0.5,1]. Text components are very lkely to be assgned wth DOC values 1 or 2, because they overlap n the two specfed drectons wth other collnear components. Ths can be easly vsualzed n Fg. 10. Fg. 11 shows a detaled example of a smple text localzaton procedure performed by the proposed technque. The color document of Fg. 11(a) s frst processed by the color reducton algorthm and the resulted mage s shown n Fg. 11(b). It has 3 colors whch defne 3 color planes. Two of them, color plane 1 and color plane 2, contan text. Fgs. 11(c)-11(d) depcts these two color planes. The ntal groupng procedure for color plane 1 and 2 s shown n Fgs. 11(e)-11(f), respectvely. The result of the fnal component groupng procedure for the two color planes s depcted n Fgs. 11(g)-11(h). The fnal text localzaton result s extracted after the classfcaton of the component groups and the supermposton of all color planes. 18

The fnal result s shown n Fg. 11() where boundng boxes ndcate the extracted locatons of the text areas. (a) (b) (c) (d) (e) (f) (g) (h) () Fgure 11. Text localzaton example. (a) Orgnal color document, (b) color reduced mage (3 colors), (c) color plane 1, (d) color plane 2, (d) ntal groupng, (e) false connectons removal, (f) fnal text localzaton after supermposton of all color planes. 7. EXPERIMENTAL RESULTS Experment 1 The frst expermental result s presented n Fg. 12, where the proposed technque s appled to a Greek complex color document mage of a book cover. As t can be observed, the document mage conssts of characters wth dfferent colors whch are stuated n a very complex background. The ntal color document and the document after the color reducton process (5 colors) are shown n Fg. 12(a) and Fg. 12(b), 19

respectvely. The ntal and fnal groupng of the CCs of the color plane whch contans nhomogeneous text s shown n Fg. 12(c) and Fg. 12(d). The fnal result (Fg. 12(e)) shows the successful extracton of text regons n the form of text lne blocks. (a) (b) (c) (d) (e) Fgure 12. Text localzaton example. (a) orgnal mage, (b) mage after color reducton (5 colors), (c) ntal groupng, (d) fnal groupng, (e) text localzaton results after classfcaton of groups and supermposton of all color planes ( Tp 0.75 ). 20

(a) (b) (c) (d) (e) Fgure 13. Text localzaton example. (a) orgnal mage, (b) mage after color reducton (4 colors), (c) ntal groupng, (d) fnal groupng, (e) text localzaton result after classfcaton of groups and supermposton of all color planes ( Tp 0.75 ). 21

Experment 2 In Fg. 13 we present an expermental result of text localzaton of a color document, where vertcal and horzontal text coexsts. Fg. 13(b) shows the result of the applcaton of the color reducton procedure whch leads to an mage wth only 4 colors. Fg 13(c) depcts the result of the ntal groupng stage that has been descrbed n Secton 5.1. The fnal groupng stage, as descrbed n Secton 5.2, s shown n Fg 13(d), where the groups of the ntal groupng procedure are refned n order to form the fnal groups. In the next stage, these groups are classfed wth the use of the nformaton obtaned by the DOC of each CC. Fg. 13(e) shows the fnal extracted text areas that are constructed by combnng the text regons obtaned n all color planes. Experment 3 - Comparson wth a commercal OCR Another type of evaluaton s performed n ths experment by comparng the results of our technque aganst the results of a commercal OCR [35] software. The comparson was based on the use of three complex color documents that ncludes vertcal and horzontal text and complex non-unform background. The orgnal documents are shown n Fgs. 14(a),(d) and (g). As t can be seen n Fgs. 14(b),(e) and (h), the OCR software fals to handle ths type of complex documents. On the other hand, the proposed technque leads to the text localzaton results shown n Fgs. 14(c),(f) and (). It s clear that the proposed technque gves superor results and can be used wth ths type of documents. 22

(a) (b) (c) (d) (e) (f) (g) (h) () Fgure 14. Comparson text localzaton results between our method and a commercal OCR software. (a),(d),(g) Orgnal color documents, (b),(e),(h) text blocks obtaned from the commercal OCR software, (c),(f),() text blocks obtaned by the proposed technque. Experment 4 In ths experment we evaluate the proposed text localzaton technque usng a set of 100 color documents. Ths set ncludes Englsh and Greek cover pages from books and magaznes whose resoluton les between 150-300 dp. The well known text 23

Block Precson Rate ( BPR ) and text Block Recall Rate ( BRR ) were adopted as metrcs. These metrcs are descrbed by the relatons where BPR N c c, BRR N e N a N (15) Nc s the number of the text blocks that were correctly extracted, Ne number of extracted blocks, and the total N a, the number of the actual (Ground Truth) text blocks. The decson whether a block has been detected correctly or not s made by vsual nspecton of the results. We have measured a mean value of BPR 81.55% and BRR 93.48%. The parameters used for ths experment are: c 4, T 0.4, T 2.5, T 1.5, T 0.75 d Experment 5 - Varous text localzaton results r o Ths last experment s focused on sx characterstc applcatons of the proposed technque. We use complex color documents havng characters of dfferent szes and colors and non-unform color backgrounds. As t s shown n Fg. 15, n all cases the proposed technque leads to satsfactory text localzaton results. p 24

(a) (b) (c) (d) (e) Fgure 15. Varous examples of text localzaton results extracted by the proposed method. (f) 25

8. CONCLUSIONS We have presented a new technque for text localzaton, sutable for applcaton on complex color documents. In ths type of documents, text and graphcs are hghly mxed wth the background and therefore the text localzaton s not an easy task. The proposed technque effcently combnes a color quantzaton procedure and a color plane text localzaton technque. The proposed technque s robust and has the followng characterstcs: It splts the color document mage nto a number of bnary mages, called color planes, correspondng to the domnant colors obtaned. In every color plane, CCs are spatally formed n groups wth the use of local nformaton n an adaptvely defned area. The nformaton that s used of each CC for classfcaton nvolves not only the closest neghbors but a large number of smlar CCs. It can detect both horzontal and vertcal text regons. ACKNOWLEDGEMENTS Ths work s co-funded by European Socal Fund and Natonal Resources-(EPEAEK-II) ARXIMHDHS 1, TEI Serron. 26

REFERENCES [1] A.K. Jan, Y. Zhong, "Page Segmentaton Usng Texture Analyss", Pattern Recognton 29 (5), (1996) 743-770. [2] A.K. Jan, S. Bhattacharjee, "Text segmentaton usng Gabor Flters for automatc document processng", Mach. Vson Appl. 5, (1992) 169 184. [3] B. Wang, X.-F. L, F. Lu, F.-Q. Hu, "Color text mage bnarzaton based on bnary texture analyss", Pattern Recognton Letters 26 (11), (2005) 1650-1657. [4] V. Wu, R. Manmatha, "TextFnder: an automatc system to detect and recognze text n mages". IEEE Transactons on Pattern Analyss and Machne Intellgence 21 (11), (1999) 1224-1229. [5] S. Deng, S. Latf, E. Regentova, "Document segmentaton usng polynomal splne wavelets", Pattern Recognton 34 (12), (2001) 2533-2545. [6] B. Sn, S. Km, B. Cho, "Locatng characters n scene mages usng frequency features", Proceedngs of Internatonal Conference on Pattern Recognton (3), (2002) 489-492. [7] C. Strouthopoulos, N. Papamarkos, A. Atsalaks, "Text extracton n complex color documents". Pattern Recognton 35 (8), (2002) 1743-1758. [8] W.Y. Chen, S.Y. Chen, "Adaptve page segmentaton for color techncal journals cover mages", Image and Vson Computng 16 (12-13), (1998) 855-877. [9] K. Sobottka, H. Kronenberg, T. Perroud, H. Bunke, "Text Extracton from Colored Book and Journal Covers", Internatonal Journal on Document Analyss and Recognton, 2 (4), (2000) 163-176. [10] H. Hase, T. Shnokawa, M. Yoneda, C.Y. Suen, "Character strng extracton from color documents", Pattern Recognton 34 (7), (2001) 1349 1365. [11] L. O Gorman, "The Document Spectrum for Page Layout Analyss", IEEE Trans. PAMI 15 (11), (1993) 1162-1173. [12] L. Fletcher, R. Kastur, "A robust algorthm for text strng separaton from mxed text/graphcs mages", IEEE Trans. PAMI 10 (6), (1988) 910 918. [13] Y. Zhong, K. Karu, A.K. Jan, "Locatng text n complex color mages", Pattern Recognton 28 (10), (1995) 1523 1535. [14] K. Jung, J. Han, "Hybrd approach to effcent text extracton n complex color mages", Pattern Recognton Letters 25 (6), (2004) 679-699. [15] S.S. Raju, P.B. Pat, A.G. Ramakrshnan, "Text localzaton and extracton from complex color mages", Lecture Notes n Computer Scence 3804 LNCS, (2005) 486-493. [16] E.K. Wong, M. Chen, "A new robust algorthm for vdeo text extracton", Pattern Recognton 36 (6), (2003) 1397-1406. [17] D. Chen, J.-M. Odobez, H. Bourlard, "Text detecton and recognton n mages and vdeo frames", Pattern Recognton 37 (3), (2004) 595-608. 27

[18] Q. Ye, Q. Huang, W. Gao, D. Zhao, "Fast and robust text detecton n mages and vdeo frames", Image and Vson Computng 23 (6), (2004) 565-576. [19] M.R. Lyu, J. Song, M. Ca, "A comprehensve method for multlngual vdeo text detecton", localzaton, and extracton, IEEE Transactons on Crcuts and Systems for Vdeo Technology 15 (2), (2005) 243-255. [20] X. Qan, G. Lu, H. Wang, R. Su, "Text detecton, localzaton, and trackng n compressed vdeo", Sgnal Processng: Image Communcaton 22 (9), (2005) 752-768. [21] L. Xu, K. Wang, "Extractng text nformaton for content-based vdeo retreval", Lecture Notes n Computer Scence 4903, (2008) 58-69. [22] Y.-L. Chen, B.-F. Wu, "Text extracton from complex document mages usng the mult-plane segmentaton technque", Conference Proceedngs - IEEE Internatonal Conference on Systems, Man and Cybernetcs 4, art. no. 4274432, (2007) 3540-3547. [23] C. Strouthopoulos, N. Papamarkos, C. Chamzas, "PLA usng RLSA and a neural network", Engneerng Applcatons of Artfcal Intellgence 12 (2), (1999) 119-138. [24] X. Lu, H. Fu, Y. Ja, "Gaussan mxture modelng and learnng of neghborng characters for multlngual text extracton n mages", Pattern Recognton 41 (2), (2008) 484-493. [25] M. Cho, H. Cho, "Effcent text detecton n color mages by elmnatng reflectance component", Lecture Notes n Computer Scence 4707, (2007) 1179-1186. [26] D. Karatzas, A. Antonacopoulos, "Colour text segmentaton n web mages based on human percepton", Image and Vson Computng 25 (5), (2007) 564-577. [27] S.J. Perantons, B. Gatos, V. Maragos, V. Karkaletss, G. Petass, "Text area dentfcaton n web mages", Lecture Notes n Artfcal Intellgence 3025, (2004) 82-92. [28] K. Jung, K.I. Km, A.K. Jan, "Text nformaton extracton n mages and vdeo: A survey", Pattern Recognton 37 (5), (2004) 977-997. [29] N. Nkolaou, N. Papamarkos, "Color segmentaton of complex document mages", Internatonal Conference on Computer Vson Theory and Applcatons, VISAPP 2006, Setúbal, Portugal, (2006) 220-227. [30] N. Nkolaou, N. Papamarkos, "Color Segmentaton of Complex Document Images", Advances n Computer Graphcs and Computer Vson, Sprngel- Verlag, (2007) 251-263. [31] A. Smon, J.C. Pret, A.P. Johnson, "A Fast Algorthm for Bottom-Up Layout Analyss", IEEE Trans. Pattern Analyss and Machne Intellgence 19 (3), (1997) 273-277. [32] Κ. Fukunaga, L.D. Hostetler, "The Estmaton of the Gradent of a Densty Functon, wth Applcatons n Pattern Recognton", IEEE Trans. Informaton Theory 21, (1975) 32-40. 28

[33] Y. Cheng, "Mean Shft, Mode Seekng, and Clusterng", IEEE Trans. Pattern Analyss and Machne Intellgence 17(8), (1995) 790-799. [34] D. Comancu, P. Meer, "Mean shft: A robust approach toward feature space analyss", IEEE Transactons on Pattern Analyss and Machne Intellgence 24(5), (2002) 603-619. [35] ABBYY FneReader. http://www.abbyy.com/fnereader_ocr/, 2007. 29