Learning an Image Manifold for Retrieval

Learnng an Image Manfold for Retreval Xaofe He*, We-Yng Ma, and Hong-Jang Zhang Mcrosoft Research Asa Bejng, Chna, 100080 {wyma,hjzhang}@mcrosoft.com *Department of Computer Scence, The Unversty of Chcago xaofe@cs.uchcago.edu ABSTRACT We consder the problem of learnng a mappng functon from low-level feature space to hgh-level semantc space. Under the assumpton that the data le on a submanfold embedded n a hgh dmensonal Eucldean space, we propose a relevance feedback scheme whch s naturally conducted only on the mage manfold n queston rather than the total ambent space. Whle mages are typcally represented by feature vectors n R n, the natural dstance s often dfferent from the dstance nduced by the ambent space R n. The geodesc dstances on manfold are used to measure the smlartes between mages. However, when the number of data ponts s small, t s hard to dscover the ntrnsc manfold structure. Based on user nteractons n a relevance feedback drven query-by-example system, the ntrnsc smlartes between mages can be accurately estmated. We then develop an algorthmc framework to approxmate the optmal mappng functon by a Radal Bass Functon (RBF neural network. The semantcs of a new mage can be nferred by the RBF neural network. Expermental results show that our approach s effectve n mprovng the performance of content-based mage retreval systems. Categores and Subject Descrptors H.3.1 [Informaton Storage and Retreval]: Content Analyss and Indexng Algorthms, Indexng methods. General Terms Algorthms, Management, Performance, Expermentaton. Keywords Image Retreval, Semantc Space, Manfold Learnng, Dmensonalty Reducton, Remannan Structure 1. INTRODUCTION Content-Based Image Retreval (CBIR [3][9][1][14][1] s a long standng research problem n computer vson and nformaton retreval. Most of prevous mage retreval technques buld on the assumpton that the mage space s Eucldean. However, n many cases, the mage space mght be a non-lnear sub-manfold whch s embedded n the ambent space. Intrnscally, there are two fundamental problems n mage retreval: 1 How do we represent an mage? How do we judge smlarty? Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. MM 04, October 10 16, 004, New York, New York, USA. Copyrght 004 ACM 1-58113-893-8/04/0010 $5.00. One possble soluton to these two problems s to learn a mappng functon from the low-level feature space to the hgh-level semantc space. The former s not always consstent wth human percepton whle the latter s what mage retreval system desres to have. Specfcally, f two mages are semantcally smlar, then they are close to each other n semantc space. In ths paper, our approach s to recover semantc structures hdden n the mage feature space such as color, texture, etc. In recent years, much has been wrtten about relevance feedback n content-based mage retreval from the perspectve of machne learnng [16][17][18][19][0], yet most learnng methods only take nto account current query sesson and the knowledge obtaned from the past user nteractons wth the system s forgotten. To compare the effects of dfferent learnng technques, a useful dstncton can be made between short-term learnng wthn a sngle query sesson and long-term learnng over the course of many query sessons [6]. Both short- and long-term learnng processes are necessary for an mage retreval system though the former has been the prmary focus of research so far. We present a long-term learnng method whch learns a radal bass functon neural net-work for mappng the low-level mage features to hghlevel semantc features, based on user nteractons n a relevance feedback drven query-by-example system. As we pont out, the choce of the smlarty measure s a deep queston that les at the core of mage retreval. In recent years, manfold learnng [1][4][11][13][15] has receved lots of attenton and been appled to face recognton [7], graphcs [10], document representaton [5], etc. These research efforts show that manfold structure s more powerful than Eucldean structure for data representaton, even though there s no convncng evdence that such manfold structure s accurately present. Based on the assumpton that the mages resde on a low-dmensonal submanfold, a geometrcally motvated relevance feedback scheme s proposed for mage rankng, whch s naturally conducted only on the mage manfold n queston rather than the total ambent space. It s worthwhle to hghlght several aspects of the framework of analyss presented here: (1 Throughout ths paper, we denote by mage space the set of all the mages. Dfferent from most of prevous geometrybased works whch assume that the mage space s a Eucldean space [8][1], n ths paper, we make a much weaker assumpton that the mage space s a Remannan manfold embedded n the feature space. Partcularly, we call t mage manfold. Generally, the mage manfold has a lower dmen- Ths work was done whle Xaofe He was a summer ntern at Mcrosoft Research Asa. 17

sonalty than the feature space. The metrc structure of the mage manfold s nduced but dfferent from the metrc structure of the feature space. Thus, a new algorthm for mage retreval whch takes nto account the ntrnsc metrc structure of the mage manfold s needed. ( Gven enough mages, t s possble to recover the mage manfold. However, f the number of mages s too small, then any algorthm can hardly dscover the ntrnsc metrc structure of the mage manfold. Fortunately, n mage retreval, we can make use of user provded nformaton to learn a semantc space that s locally sometrc to the mage manfold. Ths semantc space s Eucldean and hence the geodesc dstances on the mage manfold can be approxmated by the Eucldean dstances n ths semantc space. Ths ntuton wll be strengthened n our experments. (3 There are two key algorthms n ths framework. One s the retreval algorthm on mage manfold, and the other s an algorthm for learnng a mappng functon from feature space (color, texture, etc. to hgh-level semantc space. The learnng algorthm wll gradually flat the mage manfold, and make t better consstent wth human percepton. That s, f two mages are close (n the sense of Eucldean metrc to each other, they are semantcally smlar to each other. Here, by flat we mean that the mage manfold wll be ultmately equpped wth a flat Remannan metrc defned on t, at whch tme we call t semantc space. The rest of ths paper s organzed as follows: Secton descrbes the proposed retreval algorthm on mage manfold. Secton 3 descrbes the proposed framework for learnng a semantc space to represent the underlyng mage manfold. The expermental results are shown n Secton 4. We gve concludng remarks n Secton 5.. RELEVANCE FEEDBACK ON IMAGE MANIFOLD In many cases, mages may be vsualzed as ponts drawn on a low-dmensonal manfold embedded n a hgh-dmensonal Eucldean space. In ths paper, our objectve s to dscover the mage manfold by a localty-preservng mappng for mage retreval. We propose a geometrcally motvated relevance feedback scheme for mage rankng, whch s conducted on the mage manfold, rather than the total ambent space..1 The Algorthm Let Ω denote the mage database and R denote the set of query mages and relevant mages provded by the user. Our algorthm can be descrbed as follows: 1. Canddate generaton. For each mage x R, we fnd ts k- nearest neghbors C = {y 1, y,, y k }, y j Ω (those mages n R are excluded from selecton. Let C = C 1 C C R. We call C canddate mage set. Note that R C =.. Construct subgraph. Construct a graph G(V, where V=R C. The dstance between any two mages x, x j V s measured as follows: x x j f x x j < ε dst( x, x j = otherwse where ε s a sutable constant. The choce of ε reflects our defnton of localty. We put an edge between x and x j f dst(x, x j. Snce the mages n R are supposed to have some common semantcs, we set ther dstances to zero. That s, dst(x, x j = 0, x, x j R. The constructed graph models the local geometrcal structure of the mage manfold. 3. Dstance measure on mage manfold. To model the geodesc dstances between all pars of mage ponts on the mage manfold M, we fnd the shortest-path dstances n the graph G. The length of a path n G s defned to be the sum of lnk weghts along that path. We then compute the geodesc dstance dst G (x, x j (.e. the shortest path length between all pars of vertces of and j n G, usng Floyd s O( V 3 algorthm. 4. Retreval based on geodesc dstance. To retreve the mages most smlar to the query, we smply sort them accordng to ther geodesc dstances to the query. The top N mages are presented to the user. 5. Update query example set. Add the relevant mages provded by the user nto R. Go back to step 1 untl the user s satsfed.. Geometrcal Justfcaton Our algorthm deals wth fnte data sets of ponts n R n whch are assumed to le on a smooth submanfold M wth low dmensonalty. The algorthm attempts to recover M gven only the data ponts. A crucal stage n the algorthm nvolves estmatng the unknown geodesc dstance n M between data ponts n terms of the graph dstance wth respect to some graph G constructed on the data ponts. The natural Remannan structure on M (nduced from the Eucldean metrc on R n gves rse to a manfold metrc d M defned by: d M ( x, y = nf r { length( r} where r vares over the set of (pecewse smooth arcs connectng x to y n M. Note that d M (x, y s generally dfferent from the Eucldean dstance x-y. Our algorthm makes use of a graph G on the data ponts. Gven such a graph we can defne a metrc, just on the set of data ponts. Let x, y belong to the set {x }. We defne: d G (x, y = the length of the shortest path between x and y Gven the data ponts and graph G, one can compute d G wthout knowledge of the manfold M. Bernsten et al. [] show that the two dstance metrcs (d M and d G approxmate each other arbtrarly closely, as the densty of data ponts tends to nfnty. Here, we gve a smple example to show the advantage of geodesc dstances on manfold over Eucldean dstance, and the advantage of semantc space over low-level mage feature space. Fgure 1 shows a spral on a plane. Consder that the mages of our concern are sampled from the spral. Clearly, t s a onedmensonal manfold. Fgure 1(a shows the Eucldean dstance between data ponts A and B. Fgure 1(b shows the geodesc dstance along the spral. In ths example, the ntrnsc geometrcal structure can only be characterzed by the geodesc dstance. In many real world applcatons, one s often confronted wth the problem that the number of sample ponts s too small to descrbe the underlyng topology of the data. In ths case, the geodesc 18

(a (b mappng each mage nto a semantc space n whch the dstances between the mages are consstent wth human percepton. The problem we are gong to solve can be smply stated below: Let S denote the low-level feature space, and T denote the semantc space. Learn a nonlnear mappng functon from S to T, f : x z ( x S, z T whch preserves the local Remannan structure of the low-level feature space. (c Fgure 1. (a Eucldean dstance between data ponts A and B. (b Geodesc dstance between data ponts A and B. (c Seven data ponts sampled from the spral. The geodesc dstances between them can not be accurately estmated. (d 1-D representaton of the spral. dstance on the mage manfold can not be accurately estmated, as can be seen from Fgure 1(c. Fortunately, n mage retreval, the user provded nformaton can help us recover the underlyng structure of the mage manfold. In the next secton, we wll descrbe how to learn a contnuous functon whch maps the data ponts (mages nto a semantc space n whch the Eucldean dstances between mages are consstent wth human percepton, as llustrated n Fgure 1(d. The nonlnear Remannan structure of the manfold (Fgure 1(c can be nferred from the lnear Eucldean structure of the semantc space (Fgure 1(d. It mght be more nterestng to consder ths example n mage retreval doman. Suppose the pont A s the query mage and the other sx ponts denote the mages n database. If we conduct the retreval n the low-level feature space (Fgure 1(c, the pont B wll be selected as relevant mage, no matter what dstance metrc we use, Eucldean or geodesc. Ths s because the ntrnsc Remannan structure of the mage manfold can not be accurately detected due to the lack of suffcent sample ponts. However, f the retreval s conducted n the semantc space (Fgure 1(d, the pont B wll never be selected as relevant mage. Ths s because that, by ncorporatng user provded nformaton, the ntrnsc Remannan structure of the mage manfold can be accurately detected. Clearly, the retreval n semantc space s more consstent wth human percepton. 3. USING MANIFOLD STRUCTURE FOR IMAGE REPRESENTATION In the prevous secton, we have descrbed an algorthm to retreve the user desred mages by modelng the underlyng geometrcal structure of the mage manfold. One problem of ths algorthm s that, f the number of sample mages s very small, then t s dffcult to recover the mage manfold. In ths case, we propose a long-term learnng approach to dscover the true topology of the mage manfold usng user nteractons. To be specfc, we am at (d Our proposed soluton conssts of three steps: 1. Inferrng a semantc matrx B m m from user nteractons, whose entres are the dstances between pars of mages n semantc space T. m s the number of mages n database.. Fnd m ponts {z 1, z,, z m } R k whch preserve parwse dstances specfed n B m m. Laplacan egenmaps [1] s used to fnd such an embeddng. The space n whch the m ponts {z 1, z,, z m } are embedded s called LE semantc space n the rest of the paper. The user provded nformaton s ncorporated nto the LE semantc space. Note that, the LE semantc space s only defned on the mage database. In other words, for a new mage outsde the database, t s unclear how to evaluate ts coordnates n the LE semantc space. 3. Gven m par vectors, (x, z ( = 1,,,m, where x s the mage representaton n low-level feature space, and z s the mage representaton n LE semantc space, tran a radal bass functon (RBF neural network f that accurately predcts future z value gven x. Hence f(x s a semantc representaton of x. The space obtaned by f s called RBFNN semantc space. Note that, f(x z. That s, RBFNN semantc space s an approxmaton of the LE semantc space. However, RBFNN semantc space s defned everywhere. That s, for any mage (ether nsde or outsde the database, ts semantc representaton can be obtaned from the mappng functon. We descrbe the detal of these steps n the followng. 3.1 Inferrng a Dstance Matrx n Semantc Space from User Interactons In ths secton, we descrbe how to nfer a dstance matrx n semantc space from user nteractons. Some prevous work could be found n [6]. Here, we present a smple method to update the dstance matrx gradually. Let B denote the dstance matrx, B j = x x j. Intutvely, the mages marked by the user as postve examples n a query sesson share some common semantcs. Therefore, we can shorten the dstances between them. Let S denote the set of postve examples, S = {s 1, s,, s k }. We can adjust the dstance matrx as follows: B s s B / α ( s, s S s s where α s a sutable constant greater than 1. Smlarly, we can lengthen the dstances between the postve examples and negatve examples, as follows: R R β ( s S, t T s t s t j j 19

where T = {t 1, t,, t k } s the set of negatve examples, and β s a sutable constant greater than 1. As the user nteracts wth the retreval system, the dstance matrx wll gradually reflect the dstances between the mages n semantc space whch s consstent wth human percepton. 3. Usng Manfold Structure for Image Representaton In the above subsecton, we have obtaned a dstance matrx n semantc space. In ths subsecton, we dscuss how to fnd the semantc representaton for each mage n database, whle the dstances are preserved. Recently, there has been some renewed nterest [1][15][11] n the problem of developng low dmensonal representatons when data arses from samplng a probablty dstrbuton on a manfold. To choose a proper mappng algorthm, the followng two requrements should be satsfed: 1 Snce the mage dstrbuton n feature space s hghly rregular and nconsstent wth human percepton, the mappng algorthm must have the localty preservng property. The mappng algorthm should explctly take nto account the manfold structure. Based on these two consderatons, we use Laplacan Egenmaps [1] to fnd such a mappng. We frst compute the smlarty matrx as follows: exp( Bj /t f Bj < ε W j = 0 otherwse where t and ε s a sutable constant, and B s the dstance matrx obtaned n the prevous subsecton. Note that, the weght matrx has localty preservng property, whch s the key feature of Laplacan Egenmaps. Suppose y={y 1, y,, y m } s a one-dmensonal map of {x 1, x,, x m } n the LE semantc space. A reasonable crteron for choosng a good map s to mnmze the followng objectve functon under approprate constrants: mn ( y y j Wj y, j The objectve functon wth our choce of weghts W j ncurs a heavy penalty f neghborng ponts x and x j are mapped far apart. Therefore, mnmzng t s an attempt to ensure that f x and x j are close then y and y j are close as well. To mnmze ths objectve functon, t s equvalent to solve the followng egenvector problem: Ly = λdy where D s a dagonal matrx, whose entry s column sum (also row sum, snce W s symmetrc of matrx W, D = Σ j W j. L s called Laplacan matrx, L = D W. Let y (0, y (1,, y (n be the solutons of the above egenvector problem, ordered accordng to ther egenvalues, λ 0 λ 1 λ n. It s easy to show that λ 0 = 0, and y (0 = (1, 1. We leave out y 0 and use the next k egenvectors for embeddng n k-dmensonal Eucldean space. x z = ( y ( 1 ( ( k, y,, y ( j where y s the th entry of the egenvector y (j. z s a k- dmensonal map of mage x n the LE semantc space. In summary, our goal s to fnd a vector representaton (map n semantc space for each mage n database. Dmensonalty reducton tself s not our goal, though we can make the dmensonalty of the LE semantc space much lower than the feature space. 3.3 Learnng the Optmal Mappng Functon In the above secton, every mage n database s mapped nto the semantc space. Now, the problem s that, for a new mage outsde the mage database, t s unclear how to evaluate ts map n the LE semantc space, snce we don t have a mappng functon. Here we present an approach that apples neural network to approxmate the optmal mappng functon, whch ntrnscally dstngushes our framework from prevous work [6]. The optmal mappng functon f* s gven by mnmzng the followng cost functon: m f * = arg mn f ( x z f = 1 where m s the number of mages n database. Clearly, ths s a multvarate nonparametrc regresson problem, snce there s no a pror knowledge about the form of the true mappng functon whch s beng estmated. In ths work, we use radal bass functon (RBF networks, and the standard gradent descent s used as a search technque. The mappng functon learned by RBF networks can be represented by f ( x ω G ( x = h j j= 1 where h s the number of hdden layer neurons, ω j R are the weghts. G s the radal functon defned as follows: G x ( x = exp( σ c where c s the center for G, and σ s the bass functon wdth. The k-dmensonal mappng n semantc space can be represented as follows: x f x = ( f ( x, f ( x,, f ( ( 1 k x where f = [f 1, f,, f k ] s the mappng functon. Snce the mappng functon s approxmated by the RBFNN (radal bass functon neural network, we call ths semantc space RBFNN semantc space. In summary, the RBF neural network approxmates the optmal mappng functon from low-level feature space to semantc space. It s traned off-lne wth the tranng samples {x, z }. The computatonal complexty n retreval process wll be reduced as the dmensonalty of the semantc space s reduced. The mage representaton f(x n RBFNN semantc space s an approxmaton of mage representaton z n LE semantc space,.e., f(x z. For a new mage prevously unseen, t can be smply mapped nto the RBFNN semantc space by the mappng functon f. 4. EXPERIMENTAL RESULTS In ths paper, we focus on mage retreval based on user s relevance feedback to mprove the system s short-term and long-term performances. The user can submt a query mage ether nsde or outsde the database. The system frst computes low-level features of the query mage and then maps t nto semantc space usng the learned mappng functon. The system retreves and ranks the 0

0.85 0.75 0.65 0.55 0.45 0.35 0.5 retreval n top 0 results number of teratons 0 1 3 4 5 6 7 8 retreval on mage manfold Ru's algorthm Fgure. Comparson of retreval on mage manfold wth Ru s algorthm. mages n the database. Then, the user provdes hs judgment of the relevance of retreval. Wth the user s relevance feedback, the system refnes the search result teratvely untl the user s satsfed. The accumulated relevance feedbacks are used to construct and update the semantc space, as descrbed n Secton 3. We performed several experments to evaluate the effectveness of our proposed approaches over a large mage dataset. The mage dataset we use conssts of 3,000 mages of 30 semantc categores from the Corel dataset. Each semantc category contans 100 mages. The 3,000 mages are dvded nto two subsets. The frst subset conssts of,700 mages, and each semantc category contans 90 mages. The second subset conssts of 300 mages, and each semantc category contans 10 mages. The frst subset s used as tranng set for learnng the optmal mappng functon. The second subset s for evaluatng the generalzaton capablty of our learnng framework. A retreved mage s consdered correct f t belongs to the same category of the query mage. Three types of color features (color hstogram, color moment, color coherence and three types of texture features (tamura coarseness hstogram, tamura drectonary, pyramd wavelet texture are used n our system. The combned feature vector s 435-dmensonal. We desgned an automatc feedback scheme to model the shortterm retreval process. We only requre the user to provde postve examples. At each teraton, the system selects at most 5 correct mages as postve examples (postve examples n the prevous teratons are excluded from the selecton. These automatc generated feedbacks are used as tranng data to perform shortterm learnng. To model the long-term learnng, we randomly select mages from each category as the queres. For each query, a short-term learnng process s performed and the feedbacks are used to construct the semantc space. The retreval s defned as follows: relevant mages retreved n top N returns Accuracy = N Four experments are conducted. The experment wth the new retreval algorthm on mage manfold s dscussed n Secton 4.1. In Secton 4., we show the mage retreval performance n the learned semantc spaces. The generalzaton capablty s also evaluated. In Secton 4.3 we further test the system s performance n semantc space wth dfferent dmensonaltes. We compare 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. retreval n top 0 results 0 1 3 4 5 6 7 8 retreval n LE semantc space retreval n RBFNN semantc space retreval n low-level feature space our new algorthm wth Ru s algorthm [1] n semantc space n Secton 4.4. 4.1 Retreval on Image Manfold We compare the performance of our proposed retreval algorthm on mage manfold wth the relevance feedback approach descrbed n Ru [1]. We ddn t compare t to other mage retreval methods because our prmary purpose s to analyze the geometrcal structure of the mage space. Specfcally, we am at comparng the Eucldean structure and manfold structure for data representaton n mage retreval. The comparson was made n the lowlevel feature space wth no semantc nformaton nvolved. Fgure shows the expermental result by lookng at the top 0 retrevals. As can be seen, our algorthm outperforms Ru s approach. One reason s that the mage manfold s possbly hghly nonlnear, whle Ru s approach can only dscover the lnear structure. 4. Retreval n Semantc Space number of teratons Fgure 3. Image retreval performance n lowlevel feature space, RBFNN semantc space and LE semantc space. The query mages are from the mage database (tranng set. 4..1 Query Image Insde the Database As we dscussed n Secton 3, there are two dfferent semantc spaces, LE semantc space and RBFNN semantc space. One lmtaton of the LE semantc space s that, t only contans those mages n database,.e., tranng set. It s unclear how to evaluate the map n the LE semantc space for new test data. To overcome ths lmtaton, a mappng functon f from low-level feature space to hgh-level semantc space (LE semantc space s learned by a RBF neural network. That s, the mage representaton n LE semantc space, z, s approxmated by f(x whch s the mage representaton n RBFNN semantc space. Intutvely, the retreval performance n RBFNN semantc space should not be better than that n LE semantc space, snce RBFNN semantc space s an approxmaton of LE semantc space. Fgure 3 shows the retreval performance n low-level feature space, LE semantc space and RBFNN semantc space. Our new retreval algorthm on mage manfold s used. We use the tranng set (700 mages as the mage database. We frst conduct the experment n low-level feature space. As the prevous experment n Secton 4.1, we randomly choose 0% of mages n each semantc class as queres to perform the retreval. The user s relevance feedbacks are used to learn a LE semantc space, a RBFNN 1

Retreval n top 0 results VS. Demensonalty of RBFNN semantc space 0.34 0.3 0.3 0.8 0.6 dmensonalty 10 60 110 160 10 60 310 360 Fgure 5. The ntal retreval (no feedback s provded n RBFNN semantc space wth dfferent dmensonaltes. Fgure 4. Image retreval performance n lowlevel feature space and RBFNN semantc space. The query mages are outsde the database. semantc space, as well as a mappng functon from 435- dmensonal low-level feature space to 30-dmensonal hgh-level semantc space. We wll dscuss how to determne the ntrnsc dmensonalty of the semantc space n the next secton. As can be seen, the retreval performance n semantc spaces s much better than that n low-level feature space. The performance dfference s especally sgnfcant at the 0 th retreval when no user s relevance feedback s provded, whch s our prmary goal. That s, the semantc representaton of the query mage can be learned by the RBF neural network f whch s traned by prevous users nteractons wth the system. In fact, n the real world, f the ntal retreval result s too bad, the user mght lose hs nterest to provde feedbacks. Another observaton s that, the retreval performances n LE semantc space and RBFNN semantc space are almost the same. Ths means that the optmal mappng functon f* can be accurately approxmated by the RBF neural network f. 4.. Query Image outsde the Database --- Generalzaton Capablty Evaluaton Whle usng RBF neural network to solve the regresson problem, a key ssue s ts generalzaton capablty. Generalzaton refers to the neural network producng reasonable outputs for nputs not encountered durng tranng. To evaluate the generalzaton capablty of our model, the 300 mages (testng set are used as queres outsde the mage database (tranng set for testng. These mages have no semantc representatons n LE semantc space, but we can obtan ther semantc representatons n RBFNN semantc space by the mappng functon f. Snce our ntenton s to evaluate the generalzaton capablty of our model, the ntal retreval result s especally mportant when no feedbacks are provded. The precson-scope curves are shown n Fgure 4. As can be seen, the retreval n RBFNN semantc space outperforms that n low-level feature space. Ths means that the semantc representaton of the prevously unseen mages can be accurately learned by the RBF neural network. 4.3 Retreval n Semantc Space wth Dfferent Dmensonaltes One ssue of learnng a semantc space s how to estmate ts ntrnsc dmensonalty. Even though the dmensonalty of lowlevel feature space s normally very hgh, the dmensonalty of semantc space s much lower. In ths secton, we evaluate the retreval performance n semantc spaces wth dfferent dmensonalty. As before, the 300 mages outsde the mage databases are used as the query mages. Both the mages n database and the query mages are mapped nto RBFNN semantc space by the mappng functon. Fgure 5 shows the results. As can be seen, the optmal dmensonalty s closely related to the number of semantc classes n the database. Ths observaton concdes wth that obtaned n [6]. If the mage database admnstrator has a pror knowledge about ths number, t can be used as a gudelne to control the dmensonalty of the semantc space. The system reaches the best performance (n terms of and effcency when the dmensonalty of the semantc space s close to the number of semantc classes. Further compresson of the semantc space wll start to cause nformaton loss and decrease the retreval. 4.4 Comparng Dfferent Retreval Algorthms n Semantc Space In prevous two subsectons, we have evaluated the retreval performance n semantc space usng our retreval algorthm. It s nterestng to see how Ru s algorthm [1] performs n semantc space. Fgure 6 shows the retreval results usng our retreval algorthm and Ru s algorthm. We use the same mage database and the same query mages as n Secton 4..1. The retreval s conducted n RBFNN semantc space rather than the feature space. As can be seen, Ru s algorthm works almost the same as our algorthm. It s mportant to note that the baselne performance n semantc space s much hgher than that n low-level feature space. Ths observaton confrms our prevous ntuton that the semantc space gets more and more regular (flat and lnear as the user s relevance feedback s ncorporated. To be specfc, n the semantc space, the geodesc dstances are almost equal to the Eucldean dstances (see Fgure 1(d. Hence the Remannan structure of the mage manfold can be nferred from the Eucldean structure of the semantc space.

0.9 0.85 0.8 0.75 0.7 number of teratons 0.65 0 1 3 4 5 6 7 8 retreval on mage manfold Ru's algorthm Fgure 6. The comparson of our retreval algorthm wth Ru s algorthm n semantc space. The performances of these two algorthms are very close. It shows that the semantc space s Eucldean (flat. 5. CONCLUSIONS In ths paper, under the assumpton that the data le on a submanfold hdden n a hgh dmensonal feature space, we developed an algorthmc framework to learn the mappng between low-level mage features and hgh-level semantcs. It utlzes relevance feedback to enhance the performance of mage retreval system from both short- and long-term perspectves. Ths framework gves a soluton to the two fundamental problems n mage retreval: how to judge smlarty and how to represent an mage. To solve the frst problem, the proposed retreval algorthm on mage manfold uses the geodesc dstance rather than Eucldean dstance as the smlarty measure between mages. It takes nto account the Remannan structure of the mage manfold on whch the data may possbly resde. To solve the second problem, two semantc spaces, LE semantc space and RBFNN semantc space, are learned from user s relevance feedback. A mappng functon s approxmated by a RBF neural network. The semantc space gves a Eucldean representaton of the Remannan mage manfold. Several questons reman unclear: 1. We do not know how often and n whch partcular emprcal contexts, the manfold propertes are crucal to account for the underlyng topology of mage data. Whle the results n ths paper provde some ndrect evdence for ths, there stll seems to be no convncng proof that such manfold structures are actually present.. Secondly, and most ntrgungly, whle the noton of semantc space s a very appealng one, the propertes of the true mappng from low-level feature space to hgh-level semantc space remans unclear. It s unclear whether the true mappng s one-to-one, or many-to-one, snce ntutvely two dfferent mages mght have totally the same semantcs. The mappng functon s learned n a statstcal sense. Though the experments show ts strong generalzaton capablty, t stll remans unclear n theory. REFERENCES retreval n top 0 results [1] M. Belkn and P. Nyog, Laplacan egenmaps for dmensonalty reducton and data representaton, Advances n Neural Informaton Processng Systesms, 001. [] B. Bernsten, V. de Slva, J. C. Langford, and J. B. Tenenbaum, Graph approxmatons to geodescs on embedded manfolds, Techncal report, Stanford Unversty, December 000 [3] E. Chang, K. Goh, G. Sychay, and G. Wu, CBSA: Content- Based Soft Annotaton for Multmodal Image Retreval Usng Bayes Pont Machne. IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol. 13, No. 1, Jan. 003. [4] X. He and P. Nyog, Localty Preservng Projectons, n Advances n Neural Informaton Processng Systems 16, Vancouver, Canada, 003. [5] X. He, D. Ca, H. Lu and W.-Y. Ma, Localty Preservng Indexng for Document Representaton, n ACM SIGIR conference on Informaton Retreval, Sheffeld, 004. [6] X. He, O. Kng, W.-Y. Ma, M.-J. L, and H.-J. Zhang, Learnng a semantc space from user s relevance feedback for mage retreval, IEEE Trans. on Crcut and System for Vdeo Technology, Jan, 003. [7] X. He, S. Yan, Y. Hu and H.-J. Zhang, Learnng a localty preservng subspace for vsual recognton, n Proc. IEEE Conf. on Computer Vson, Nce, France, 003. [8] Y. Ishkawa, R. Subramanya and C. Faloutsos, MndReader: query databases through multple examples, 4 th Conf. on Very Large Databases, New York, 1998. [9] W.-Y. Ma and B. S. Manjunath, Netra: A toolbox for navgatng large mage databases, Multmeda System Journal, vol. 7, pp. 184-198, 1999. [10] W. Matusk, H. Pfster. M. Brand, and L. McMllan, A data-drven reflectance model, n Proc. of SIGGRAPH, 003. [11] S.T. Rowes, and L.K. Saul, Nonlnear dmensonalty reducton by locally lnear embeddng, Scence, vol 90, December 000. [1] Y. Ru and T. S. Huang, Optmzng learnng n mage retreval, n Proc. of IEEE Conf. on Computer Vson and Pattern Recognton, Hlton Head, SC, June 000. [13] H.S. Seung and D. Lee, The manfold ways of percepton, Scence, vol 90, December 000. [14] J. Smth and S.F. Chang, VsualSEEk: A fully automatc conent-based mage query system, ACM Multmeda, 1996. [15] J.B. Tenenbaum, V.D. Slva, and J.C. Langford, A global geometrc framework for nonlnear dmensonalty reducton, Scence, Vol 90, December 000. [16] K. Teu and P. Vola, Boostng mage retreval, n Proc. IEEE Conf. on Computer Vson and Pattern Recogntno, Hlton head, SC, June 000. [17] S. Tong and E. Chang, Support vector machne actve learnng for mage retreval, n Proc. ACM Multmeda 001, Ottawa, Canada, 001. [18] N. Vasconcelos and A. Lppman, Learnng from user feedback n mage retreval systems, Advances n Neural Informaton Processng Systems, Denver, Colorado, 1999. [19] J. Wang and J. L, Learnng-based lngustc ndexng of pctures wth -D MHMMs, n Proc. ACM Multmeda, pp. 436-445, Juan Les Pns, France, 00. [0] X. S. Zhou and T.S. Huang, Comparng Dscrmnatng Transformatons and SVM for Learnng durng Multmeda Retreval, n Proc. ACM Multmeda 001, Ottawa, 001. [1] L. Zhu, A. Rao and A. Zhang, A theory of keyblock-based mage retreval, ACM Trans. on Informaton Systems, vol. 0, No., pp. 4-57, 00. 3