Semantic Image Retrieval Using Region Based Inverted File

Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal: { Dengsheng.Zhang, Md.Monrul.Islam,Guoun.Lu}@nfotech.monash.edu.au 2 School of Informaton Scence and Technology, Southwest Jaotong Unversty Chengdu, Schuan 6003, Chna Emal: hou@home.swtu.edu.cn Abstract Image data s as common as textual data n ths dgtal world. There s an urgent demand of mage management tools as effcent as those text search engnes. Decades of research on mage retreval has found there s a sgnfcant gap between the exstng content based mage retreval and semantc nterpretaton of mages by human. As a result, recent research on mage retreval has shfted to semantc mage retreval. Many semantc mage retreval models have been proposed, however, these methods are stll alenated from the wdely accepted text based retreval method. In ths paper, we propose to unte the semantc mage retreval model wth text based retreval usng a novel regon based nverted fle ndexng method. For ths purpose, mages are translated nto textual documents whch are then ndexed and retreved the same way as the conventonal text based search. Results show that our method not only provdes text based search effcency, but also better performance than the conventonal low level mage retreval. Keywords: CBIR, semantc mage retreval, nverted fle, decson tree. I. INTRODUCTION Wth the advancement of dgtal mage capturng devces and low cost electronc memory, huge amounts of mages are beng created everyday n dfferent areas. The need for the development of effcent and effectve methodologes to manage large mage databases for retreval s urgent. In the past, many content based mage retreval (CBIR) technques have been proposed [-2]. However, there s a sgnfcant gap between the low level mage features and hgh level semantcs. Recent research n mage retreval has shfted from CBIR to semantc based mage retreval, or SBIR for short. SBIR focuses on learnng semantcs from mage content or automatc mage annotaton [, 3]. The dea s to assocate the mage content wth semantcs usng machne learnng technques. Many SBIR technques have been proposed n lterature [4-3]. The representatve works n ths area nclude the co-occurrence model [4], cross-meda relevance model [5], translaton model [6], Gaussan mxture model [7-8], refnement model [9-0] and latent Drchlet allocaton model [-2]. Most of these technques are based on the classcal Bayesan theory. In these methods, a semantc model s learnt from a collecton of regon samples of each mage category. Once the semantc models are avalable, they are used for both mage annotaton and mage retreval. Gven an unknown mage, ts features are compared wth the learnt semantc models, the model wth the closest match to the mage s used as the annotaton of the mage. Gven a keyword query, mages n the database are ranked and retreved accordng to the probablty of annotatng each mage wth the query word. It can be seen that mage retreval s the same process as the annotaton, and the models are equvalent to the smlarty measure n the tradtonal CBIR. As the result, mage retreval n these cases s equvalent to the regon-regon matchng n the tradtonal regon based CBIR. Ths s mpractcal when the model s appled to large database, lke the Internet. At ths moment, semantc mage retreval s qute dfferent from textual document retreval, whch separates annotaton or keywords from documents and ndexes textual database wth keywords nstead of documents. Inspred by the textual document retreval technque, we adopt a dfferent approach from exstng semantc mage retreval methods. Specfcally, we frst buld a semantc dctonary and translate each mage document nto a set of textual keywords. We then ndex the mage database usng the nverted fle, so that mage retreval s done the same way as textual document retreval. However, the buldng of nverted fle for mage database s not straghorward. Unlke keywords n the textual documents, semantc keywords n mage documents carry dfferent weghts. Ths

s due to each semantc keyword s assocated wth regons of dfferent sze n each mage. Therefore, durng the ndexng, the mportance of each semantc keyword needs to be determned based on the regon sze the keyword s assocated to. Furthermore, spatal relatonshp between mage regons to whch the semantc keywords are assocated also need to be consdered durng the ndexng. The key dfference between our approach and exstng semantc mage retreval approach are n two aspects. Frst, we separate the semantc keywords from the mage documents and do mage annotaton and ndexng offlne. Second, we buld a regon based nverted fle to ndex mage documents the same way as textual documents, so that mage retreval s done the same way as textual document retreval. The rest of the paper s organzed as followng. In secton 2, we brefly descrbe how mage document s translated nto textual document. In secton 3, the proposed regon based nverted fle ndexng method s descrbed n detals. We present experment results n secton 4. The paper s concluded n secton 5. II. SEMANTIC LEARNING The basc dea of semantc learnng s to break down mages nto regons, learn semantcs from tranng regons and translate mages n database nto textual documents. Ths s llustrated n Fg..In the frst step, mages n the database are broken down nto regons usng automatc mage segmentaton algorthm such as JSEG tool [5]. Once mages are segmented nto regons, they are represented wth both colour and texture features, such as domnant colour descrptors (DCD) and Gabor texture features [3, 4]. After mages are segmented nto regons and represented usng colour and texture features, two vsual dctonares are created, they are, color dctonary and texture dctonary. A vsual dctonary s bascally a set of representatve feature vectors, t s analogue to a monolngual dctonary lke the Chnese dctonary or Englsh dctonary. It s generated from a set of tranng regons by clusterng the sample regons n the tranng sets. We have developed an adaptve vector quantzaton (AVQ) algorthm to generate the vsual dctonary. Once the vsual dctonares are generated, the colour feature vector of a regon s replaced by the ndex of the closest representatve feature (codeword) of the colour dctonary. A smlar approach s appled to dscretze the texture feature vectors. Once the vsual dctonares are created, a mappng between a semantc concept and codewords from dfferent vsual dctonares needs to be establshed. The set of all such mappngs forms the semantc dctonary whch s analogue to a blngual dctonary lke the Chnese-Englsh dctonary or Englsh-Chnese dctonary. In ths paper, we use decson tree (DT) to establsh these mappngs [3]. Regons can be better separated f both colour and texture dctonares are combned, therefore, both the color and texture dctonares are used n the semantc learnng. The annotaton usng DT conssts of two stages. The frst stage s the tranng stage where the DT s traned by labellng the regons n the tranng dataset whch s obtaned from ground truth. The DT s bult upon the basc C4.5 algorthm wth an addtonal prunng process. C4.5 has been chosen because the attrbute selecton process of C4.5 s proved to better than that of ID3 [3]. Colour and texture attrbutes are frst converted to nomnal values usng the vsual dctonares. Next, the nomnal values of the tranng sets are nput to the DT for nducton. Once the DT s bult and traned wth the tranng set, t s used as the semantc dctonary to annotate unknown regons of an mage n the second stage. After the annotaton, each mage n database s translated nto a set of keywords for ndexng. Sky Bush Sky Learnng Anmal Annotaton House Horse Fgure. Block dagram of translatng mage nto textual documents.

III. SEMANTIC IMAGE INDEXING AND RETRIEVAL USING INVERTED FILE Once the mages are translated nto textual documents, they can be ndexed and retreved usng the effcent nverted fle technque. In ths secton, we fst brefly descrbe the conventonal text based nverted fle ndexng and retreval technque. We then descrbe the proposed regon based nverted fle for semantc mage ndexng and retreval. A. Text Based Inverted Fle An nverted fle s a data structure where documents are ndexed by term documents structure nstead of the conventonal document term structure. Typcally, an nverted fle s a collecton of lsts - one lst for each term. To form an nverted fle, all the documents of a database are parsed to fnd the canddate keywords or terms. Suppose the total numbers of documents and terms are N and M (M<<N), respectvely. The nverted fle contans one lst for each term. For each term, term, the followng nformaton are collected and stored wth the term n the correspondng lst: Document frequency df : the number of documents contanng the term term Document ID doc : ndcatng whch document contans the term term Term frequency : the number of tmes term appears n the th document Table I shows an nverted fle for textual documents. It s understood that f a term appears frequently n a document, the term s hghly relevant to that document. Therefore, ensures that hgh frequency terms have more weght than low frequency terms. However, f a term appears n many documents, ths term has lttle relevance to any partcular document, and ths term should have less weght than others. Ths s defne by the nverse document frequency df and s calculated as, df = N log( ) df () and df are used to calculate the weght (mportance) of term n the th document, tw = df (2), The th document s then represented wth an M dmensonal feature vector, F = tw,, tw,2, tw,3, L, tw, M (3) The query text s also converted nto feature vector of dmenson M, Q = q, q2, q 3, L, q M (4) where, q takes bnary values and s defned as,, f term appears n the query text Q q = (5) 0, otherwse Terms Document Frequency TABLE I: INVERTED FILE FOR TEXTUAL DOCUMENTS. Inverted Lst (Documents & term frequency ) term df 2 df < doc, >, < doc2, >, L, < doc df, > M M M term df 2 df < doc, >, < doc2, >, L, < docdf, > M M M 2 df term M df M M < doc, M >, < doc2, M >, L, < docdf, > M M

The smlarty between a query text Q and the th document D s measured as, Smlarty( D, Q) = ( tw, q ) M 2 tw, q M M The documents are then ranked accordng to the smlarty scores and returned to users. B. Image Indexng Usng Inverted Fle Snce mages have been translated nto textual documents, mages can be ndexed and retreved the same way as the textual document ndexng and retreval usng nverted fle data structure. However, due to the dfference between textual and vsual words, the term frequency and term weght of the textual documents need to be redesgned for mages. Furthermore, addtonal nformaton lke spatal nformaton of mages should also be ncluded n the nverted fle to ncrease retreval accuracy. In textual documents, each occurrence of a term n a document gves the same nformaton as other occurrence of the same term, therefore the weght of a term s well determned by the term frequency. For mages, however, each occurrence of a term (concept) may represent a regon wth dfferent sze from another occurrence of the same term. For example, n Fgure 2, three regons wth dfferent szes belong to the same concept flower. If the term frequency 3 were used, t would not capture the mportance of the large flower n the centre, because another mage may have a much smaller background flower also wth 3 segmented regons. Therefore, nstead of usng ust the term frequency, the regons assocated wth the same term are summed together and a term area s calculated to represent the mportance of the term flower. Ths term area s the term weght of flower n ths partcular mage. As the result, the term weght tw, redefned as the sum of areas of all regons assocated wth the term n the th mage. In addton to term frequency, there are other dfferences between mages and text documents, whch affect the calculaton of the term weght. Images consst of (6) s pxels n 2D coordnate system, they contan rch spatal nformaton. For example, an anmal usually appears n the mddle, water n the bottom, sky n the upper part of the mages as shown n Fgure 3. If a regon s annotated as an anmal regon, the accuracy of the retreval can be mproved f the regon s found n the centre of the mage. Therefore, an anmal regon appearng n the centre should have more weght than an anmal regon appearng elsewhere n the mage. The poston weght of a regon reg k n the th mage document s calculated usng Equaton (7) as followng: d pos _ weghtk = 2 ( ) (7) dmax Fgure 2: Multple regons wth flower concept. Fgure 3: Examples of spatal postons. Sky n the top, water n the bottom and anmals n the centre. where, d s the dstance of the regon centrod from the centre of ts normal poston n an mage and d max s ts maxmum possble dstance. Fgure 4 shows typcal examples of how to calculate d and d max for an anmal regon, a cloud regon and a water regon. Equaton (7) ensures that the farther s the regon from the centre, the less s the poston weght. d and d max for other concepts are calculated n a smlar way. In TABLE II, we defne how d and d max are calculated for 3 concepts used n ths paper.

d d d max d max d d max (a) (b) (c) Fgure 4: Calculaton of d and d max for (a) anmal, (b) sky and (c) grass regons. TABLE II: DEFINITIONS OF d AND d max FOR DIFFERENT CONCEPTS. Concept of the regon Assumpton of normal poston d Dstance between d max Dstance between anmal, car, frut, flower Centre Image centrod and regon centrod Image centrod and furthest corner frework, sky, brd Top Top centre and regon centrod Top centre and furthest corner grass, sand, water Bottom Bottom centre and regon centrod Bottom centre and furthest corner Another mportant pror nformaton s the co-occurrence of obects n many mage categores. In mages, certan obects often appear concurrently, for example tger and grass, brd and sky, beach and sea, etc, as shown n Fgure 5. If a regon s labelled as a brd regon, the accuracy of retreval can be mproved f t appears wth sky regon n the same mage. Therefore, the weght of a brd regon s ncreased f t appears wth a sky regon. Ths dea s mplemented by ntroducng relatonshp weght of a regon, rel _ weght k whch s calculated as the frequency of co-occurrng regons for the regon reg k n the th mage. For example, f a regon s labelled as brd and two other regons are labelled as sky n the same mage, the relatonshp weght of brd regon wll be 3 (=+2). We have studed the mage database and have made a lst of cooccurrng obects. TABLE III shows some of co-occurrng obects used n the semantc retreval. Because of the above mentoned addtonal nformaton, an nverted fle for mage documents needs to store more nformaton than an nverted fle for conventonal textual documents. For example, area, pos_weght and rel_weght are stored for each regon. TABLE IV shows the structure of an nverted fle for mage documents. Analogous to TABLE I, each concept term s assocated wth followng set of nformaton: Document frequency df : the number of mages contanng one or more regons labelled wth term (or concept ) Image ID m : ndcatng the mage whch contans one or more regons labelled wth term Term frequency wth term n the th mage Regon nformaton lst number of regon records as : the number of regons labelled reginfo : contanng {( reginfo = a, p, r ),( a2, p2, r 2 ) L,( a, p, r where, a, p and r are respectvely the area, pos_weght and rel_weght of a regon. The nformaton from Table IV s used to compute tw, whch s done n offlne durng actual ndexng: tw = df ( areak pos weghtk rel _ weght ) k k= Inverse document frequency )} _ (8) df s calculated n a smlar way of Equaton (). The modfed term weght tw of (8) replaces Equaton (3) to extract the feature vector for each mage. All mages are represented and

ndexed usng these feature vectors. Durng retreval, a query text s gven wth one or more keywords. The query text s converted nto a query feature vector as shown n Equaton (4). The smlartes between query feature vector and all feature vectors of database mage are calculated usng Equaton (6). Images are ranked accordng to the smlarty values and are dsplayed to the users. Fgure 5: Examples of co-occurrng obects n mages. Top: sky and brd. Bottom: tger and grass. Concept Name Black Ape Black Elephant Brown Horse Buldng Butterfly Camel Caulflower Corn Deer Eagle Eggplant Fghter Plane Golden Fsh Greyhound Dog House TABLE III: LIST OF CO-OCCURRING CONCEPTS. Frequently occurrng concepts Forest, Brown, Green Brown, Green, Black Water, Blue Water Tree, Forest, Brown, Green Blue Sky, Cloudy Sky, Other Sky Tree, Forest, Brown, Green, Flower Green, Sand Green, Green Leaf Green, Green Leaf Brown, Green Blue Sky, Cloudy Sky, Other Sky Green, Green Leaf Blue Sky, Cloudy Sky, Other Sky Black Water, Blue Water Green, Green Leaf, Buldng, House Blue Sky, Cloudy Sky, Other Sky Term Image Freq. term df TABLE IV: INVERTED FILE FOR IMAGE DOCUMENTS. Inverted Lsts 2 2 df df m,, reginfo, m2,, reginfo, L, mdf,, reginfo M M M 2 2 df df term df m,, reginfo, m2,, reginfo, L, mdf,, reginfo M M M 2 2 df term M df M df M m, M, reginfom, m2, M, reginfom, L, mdf, M, reginfom M M IV. EXPERIMENTAL RESULTS In ths secton, we test the performance of semantc ndexng and retreval usng nverted fle. The DT s traned usng a set of labelled regons. We collect 7,600 mages. 5,00 mages are from Corel dataset. The remanng 2,500 mages are downloaded from the Internet by searchng Google and Yahoo wth 50 dfferent keywords. All mages are segmented nto regons usng JSEG segmentaton algorthm. In total, 55,870 regons are obtaned. Each regon s represented wth a colour descrptor and a texture descrptor. Domnant colour descrptor (DCD) s used as the colour descrptor, Gabor features are used as the texture descrptor. In total, there are 02 concepts n the mage database.

To tran and test the DT, we select 6,088 regons belongng to dfferent concepts. The number of regons s dfferent for each concept. For each concept, we randomly select 50% regons for tranng and other 50% regons for testng. Ths process generates a tranng set and a testng set both consstng of 3,044 regons. Each regon n the tranng and testng set s descrbed by two attrbutes (color and texture) and a class label. All the attrbutes are multdmensonal vectors. Among them, the colour attrbute has varable dmenson and the texture attrbute has fxed dmensons. They are dscretzed usng the AVQ algorthm and fed nto the DT algorthm descrbed n Secton 2. Wth the DT annotaton, we are able to acheve 67% annotaton accuracy compared wth 5% accuracy by the wdely used Bayesan method n lterature (Fg. 6(a)). We then compare semantc mage retreval wth low level mage retreval. In low level retreval, mages are segmented nto regons the same way as above. However, nstead of keyword based representaton, each regon n an mage s represented by low level colour and texture features. To be far, all the colour and texture features used n the annotaton stage are used to represent each regon. Images are ndexed as { mage regon} and mage retreval s done by mage by mage matchng. For low level retreval, 02 mages are selected as queres - one query mage for each concept. We select those query mages whch gve the best low level retreval performance. As the number of regons n dfferent mages s dfferent, we use wdely used earth mover s dstance (EMD) to fnd the dstance between two mages [3]. Gven a query mage, all database mages are ranked based on ther EMD dstance from the query. The retreval accuracy s reported usng precson on top k retreved mages, where k vares by 0, 20, 30,..., 00. For each query, precson s calculated at each level of k. Precsons of all queres are then averaged. For semantc retreval, we have 02 dfferent queres representng 02 concepts. The retreval precsons are averaged usng 02 queres. Fg. 6(b) shows the comparson of the two retreval methods. The sgnfcant dfference between the two types of retreval demonstrates the effectveness of learnng. In low level retreval, due to feature naccuracy, smlar mages are often placed far way and dfferent mages are also often placed close to each other n hgh dmensonal feature space. Therefore, a low level retreval often results n many rrelevant mages. Ths problem s overcome n SBIR by learnng whch ncorporate human knowledge. For example, two smlar mages may be placed far away each other, n SBIR, however they are placed n the same cluster due to learnng. Therefore, they are regarded as relevant mages n SBIR. As a result the retreval performance ncreases. Precson (%) 70 60 50 40 30 20 0 0 Annotaton Accuracy (%) 70 65 60 55 50 45 40 35 30 25 20 5 0 5 0 DDT (a) Bayesan 0 20 30 40 50 60 70 80 90 00 Top K retreved mages (b) Semantcl retreval Low level retreval Fgure 6. (a) comparson of annotaton accuracy; (b) comparson of retreval accuracy.

V. CONCLUSIONS We have proposed a method to translate mages nto textual documents and ndex mages usng a regon based nverted fle. We demonstrate that there s a natural relatonshp between mage data and textual data. The novelty of our approach s that nstead of dong annotaton onlne lke those exstng methods n lterature, we separate semantcs from mage documents and ndex mages wth semantc terms n offlne. Ths s equvalent to classfy mages n database nto semantc classes and then retreve mages class by class. Ths s much more effcent than exstng methods when the retreval model s appled to large scale database lke the Internet. Because t s mpractcal n ths case to navgate through all the mages n the database, rather, t s only practcal to select mages relevant to the query for rankng. Another novelty of our approach s the ncluson of some of the mportant pror nformaton n the nverted fle ndexng. Results show our semantc learnng has an advantage over the wdely used Bayesan annotaton method n lterature and has hgher retreval performance than the tradtonal CBIR method. In future, we wll nvestgate the varous metadata sources and combne mage content data wth metadata f they are avalable. The current concept set wll also be expanded to adapt the system to larger databases. REFERENCES [] R. Datta, D. Josh, J. L and J. Z. Wang, Image retreval: deas, nfluences, and trends of the new age, ACM Computng Surveys, 40(2):5:-60, 2008. [2] F. Long, H. J. Zhang and D. D. Feng, Fundamentals of content-based mage retreval, Multmeda Informaton Retreval and Management, D. Feng Eds, Sprnger, 2003. [3] Y. Lu, D. S. Zhang and G. Lu, "A survey of contentbased mage retreval wth hgh-level semantcs", Pattern Recognton, 40():262-282, 2007. [4] Y. Mor, H. Takahash and R. Oka, Image-to-word transformaton based on dvdng and vector quantzng mages wth words, Proc. of Internatonal. Workshop on Multmeda Intellgent Storage and Retreval Management, 999. [5] J. Jeon, V. Lavrenko and R. Manmatha, Automatc mage annotaton and retreval usng cross-meda relevance models, Proc. of ACM SIGIR03, pp.9 26, 2003. [6] P. Duygulu, K. Barnard, J. F. G. de. Fretas, and D. A. Forsyth, Obect recognton as machne translaton: learnng a lexcon for a fxed mage vocabulary, Proc. of ECCV02, pp.97-2, 2002. [7] J. L and J. Z. Wang, Real-tme computerzed annotaton of pctures, IEEE PAMI, 30(6):985-002, 2008. [8] G. Carnero A. B. Chan, P. J. Moreno and N. Vasconcelos, Supervsed learnng of semantc classes for mage annotaton and retreval, IEEE PAMI, 29(3):394-40, 2007. [9] Y. Lu, D. Zhang and G. Lu, Seve-search mages effectvely through vsual elmnaton, Lecture Notes n Computer Scence, 4577:38-390, 2007. [0] C. Wang, F. Jng, L. Zhang and H.-J. Zhang, Content-based mage annotaton refnement, Proc. of CVPR, 2007. [] L. Fe-Fe and P. Perona, A Bayesan herarchy model for learnng natural scene categores, Proc. of CVPR, 2005. [2] D. Ble, A. Ng and M. Jordan, Latent Drchlet allocaton, Journal of Machne Learnng Research, 3:993 022, 2003. [3] Y. Lu, D. Zhang and G. Lu, Regon-based mage retreval wth hgh-level semantcs usng decson tree learnng, Pattern Recognton, 4(8):2554-2570, 2008. [4] D. Zhang, A. Wong, M. Indrawan and G. Lu. "Content-based mage retreval usng gabor texture features", Proc. of Frst IEEE Pacfc-Rm Conference on Multmeda (PCM'00), pp.392-395, Sydney, Australa, 2000. [5] Y. Deng and B. S. Manunath, "Unsupervsed Segmentaton of Color-Texture Regons n Images and Vdeo," IEEE Trans. on Pattern Analyss and Machne Learnng (PAMI), 23(8):800-80, 200.