IntentSearch:Capturing User Intention for One-Click Internet Image Search

Size: px

Start display at page:

Download "IntentSearch:Capturing User Intention for One-Click Internet Image Search"

Francis Carpenter
6 years ago
Views:

1 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY InenSearch:Capuring User Inenion for One-Click Inerne Image Search Xiaoou Tang, Fellow, IEEE, Ke Liu, Jingyu Cui, Suden Member, IEEE, Fang Wen, Member, IEEE and Xiaogang Wang, Member, IEEE Absrac Web-scale image search engines (e.g. Google Image Search, Bing Image Search) mosly rely on surrounding ex feaures. I is difficul for hem o inerpre users search inenion only by query keywords and his leads o ambiguous and noisy search resuls which are far from saisfacory. I is imporan o use visual informaion in order o solve he ambiguiy in ex-based image rerieval. In his paper, we propose a novel Inerne image search approach. I only requires he user o click on one query image wih he minimum effor and images from a pool rerieved by ex-based search are re-ranked based on boh visual and exual conen. Our key conribuion is o capure he users search inenion from his one-click query image in four seps. (1) The query image is caegorized ino one of he predefined adapive weigh caegories, which reflec users search inenion a a coarse level. Inside each caegory, a specific weigh schema is used o combine visual feaures adapive o his kind of images o beer re-rank he ex-based search resul. (2) Based on he visual conen of he query image seleced by he user and hrough image clusering, query keywords are expanded o capure user inenion. (3) Expanded keywords are used o enlarge he image pool o conain more relevan images. (4) Expanded keywords are also used o expand he query image o muliple posiive visual examples from which new query specific visual and exual similariy merics are learned o furher improve conen-based image re-ranking. All hese seps are auomaic wihou exra effor from he user. This is criically imporan for any commercial web-based image search engine, where he user inerface has o be exremely simple. Besides his key conribuion, a se of visual feaures which are boh effecive and efficien in Inerne image search are designed. Experimenal evaluaion shows ha our approach significanly improves he precision of op ranked images and also he user experience. Index Terms Image search, Inenion, Image re-ranking, Adapive similariy, Keyword expansion 1 INTRODUCTION MANY commercial Inerne scale image search engines use only keywords as queries. Users ype query keywords in he hope of finding a cerain ype of images. The search engine reurns housands of images ranked by he keywords exraced from he surrounding ex. I is well known ha ex-based image search suffers from he ambiguiy of query keywords. The keywords provided by users end o be shor. For example, he average query lengh of he op 1, 000 queries of Picsearch is words, and 97% of hem conain only one or wo words [1]. They canno describe he conen of images accuraely. The search resuls are noisy and consis of images wih quie differen semanic meanings. Figure 1 shows he op ranked images from Bing image search using apple as query. They belong o differen caegories, such as green apple, red apple, apple logo, and iphone, because of he ambiguiy of he word apple. The ambiguiy issue occurs for several reasons. Firs, he X. Tang and K. Liu are wih he Deparmen of Informaion Engineering, he Chinese Universiy of Hong Kong, Hong Kong. J. Cui is wih he Deparmen of Elecrical Engineering, Sanford Universiy, USA. F. Wen is wih Microsof Research Asia, China. X. Wang is wih he Deparmen of Elecronic Engineering, he Chinese Universiy of Hong Kong, Hong Kong. Fig. 1. Top ranked images reurned from Bing image search using apple as query. query keywords meanings may be richer han users expecaions. For example, he meanings of he word apple include apple frui, apple compuer, and apple ipod. Second, he user may no have enough knowledge on he exual descripion of arge images. For example, if users do no know gloomy bear as he name of a caroon characer (shown in Figure 2(a)) and hey have o inpu bear as query o search images of gloomy bear. Lasly and mos imporanly, in many cases i is hard for users o describe he visual conen of arge images using keywords accuraely. In order o solve he ambiguiy, addiional informaion has o be used o capure users search inenion. One way is ex-based keyword expansion, making he exual descripion of he query more deailed. Exising linguisically-relaed mehods find eiher synonyms or oher linguisic-relaed words from hesaurus, or find words frequenly co-occurring wih he query keywords.

JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 2 Fig. 2. (a) Images of gloomy bear (b) Google Relaed Searches of query bear.

2 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY Fig. 2. (a) Images of gloomy bear (b) Google Relaed Searches of query bear. For example, Google image search provides he Relaed Searches feaure o sugges likely keyword expansions. However, even wih he same query keywords, he inenion of users can be highly diverse and canno be accuraely capured by hese expansions. As shown in Figure 2(b), gloomy bear is no among he keyword expansions suggesed by Google Relaed Searches. Anoher way is conen-based image rerieval wih relevance feedback. Users label muliple posiive and negaive image examples. A query-specific visual similariy meric is learned from he seleced examples and used o rank images. The requiremen of more users effor makes i unsuiable for web-scale commercial sysems like Bing image search and Google image search, in which users feedback has o be minimized. We do believe ha adding visual informaion o image search is imporan. However, he ineracion has o be as simple as possible. The absolue minimum is One- Click. In his paper, we propose a novel Inerne image search approach. I requires he user o give only one click on a query image and images from a pool rerieved by ex-based search are re-ranked based on heir visual and exual similariies o he query image. We believe ha users will olerae one-click ineracion which has been used by many popular ex-based search engines. For example, Google requires a user o selec a suggesed exual query expansion by one-click o ge addiional resuls. The key problem o be solved in his paper is how o capure user inenion from his one-click query image. Four seps are proposed as follows. (1) Adapive similariy. We design a se of visual feaures o describe differen aspecs of images. How o inegrae various visual feaures o compue he similariies beween he query image and oher images is an imporan problem. In his paper, an Adapive Similariy is proposed, moivaed by he idea ha a user always has specific inenion when submiing a query image. For example, if he user submis a picure wih a big face in he middle, mos probably he/she wans images wih similar faces and using face-relaed feaures is more appropriae. In our approach, he query image is firsly caegorized ino one of he predefined adapive weigh caegories, such as porrai and scenery. Inside each caegory, a specific pre-rained weigh schema is used o combine visual feaures adaping o his kind of images o beer re-rank he ex-based search resul. This correspondence beween he query image and is proper similariy measuremen reflecs he user inenion. This iniial re-ranking resul is no good enough and will be improved by he following seps. (2) Keyword expansion. Query keywords inpu by users end o be shor and some imporan keywords may be missed because of users lack of knowledge on he exual descripion of arge images. In our approach, query keywords are expanded o capure users search inenion, inferred from he visual conen of query images, which are no considered in radiional keyword expansion approaches. A word w is suggesed as an expansion of he query, if a cluser of images are visually similar o he query image and all conain he same word w 1. The expanded keywords beer capure users search inenion since he consisency of boh visual conen and exual descripion is ensured. (3) Image pool expansion. The image pool rerieved by ex-based search accommodaes images wih a large variey of semanic meanings and he number of images relaed o he query image is small. In his case, reranking images in he pool is no very effecive. Thus more accurae query by keywords is needed o narrow he inenion and rerieve more relevan images. A naive way is o ask he user o click on one of he suggesed keywords given by radiional approaches only using ex informaion and o expand query resuls like in Google Relaed Searches. This increases users burden. Moreover, he suggesed keywords based on ex informaion only are no accurae o describe users inenion. Keyword expansions suggesed by our approach using boh visual and exual informaion beer capure users inenion. They are auomaically added ino he ex query and enlarge he image pool o include more relevan images. Feedback from users is no required. Our experimens show ha i significanly improves he precision of op ranked images. (4) Visual query expansion. One query image is no diverse enough o capure he user s inenion. In Sep (2), a cluser of images all conaining he same expanded keywords and visually similar o he query image are found. They are seleced as expanded posiive examples o learn visual and exual similariy merics, which are more robus and more specific o he query, for image re-ranking. Compared wih he weigh schema in Sep (1), hese similariy merics reflec users inenion a a finer level since every query image has differen merics. Differen from relevance feedback, his visual expansion does no require users feedback. All hese four seps are auomaic wih only one click in he firs sep wihou increasing users burden. This makes i possible for Inerne scale image search by 1. The word w does no have o be conained by he query image.

3 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY boh exual and visual conen wih a very simple user inerface. Our one-click inenional modeling in Sep (1) has been proven successful in indusrial applicaions [2], [3] and is now used in Bing image search engine [4] 2. This work exends he approach wih Seps (2)-(4) o furher improve he performance grealy. 2 RELATED WORK 2.1 Image Search and Visual Expansion Many Inerne scale image search mehods [5] [9] are ex-based and are limied by he fac ha query keywords canno describe image conen accuraely. Conen-based image rerieval [10] uses visual feaures o evaluae image similariy. Many visual feaures [11] [17] were developed for image search in recen years. Some were global feaures such as GIST [11] and HOG [12]. Some quanized local feaures, such as SIFT [13], ino visual words, and represened images as bags-of-visualwords (BoV) [14]. In order o preserve he geomery of visual words, spaial informaion was encoded ino he BoV model in muliple ways. For example, Zhang e al. [17] proposed geomery-preserving visual phases which capured he local and long-range spaial layous of visual words. One of he major challenges of conen-based image rerieval is o learn he visual similariies which well reflec he semanic relevance of images. Image similariies can be learned from a large raining se where he relevance of pairs of images is known [18]. Deng e al. [19] learned visual similariies from a hierarchical srucure defined on semanic aribues of raining images. Since web images are highly diversified, defining a se of aribues wih hierarchical relaionships for hem is challenging. In general, learning a universal visual similariy meric for generic images is sill an open problem o be solved. Some visual feaures may be more effecive for cerain query images han ohers. In order o make he visual similariy merics more specific o he query, relevance feedback [20] [26] was widely used o expand visual examples. The user was asked o selec muliple relevan and irrelevan image examples from he image pool. A query-specific similariy meric was learned from he seleced examples. For example, in [20] [22], [24], [25], discriminaive models were learned from he examples labeled by users using suppor vecor machines or boosing, and classified he relevan and irrelevan images. In [26] he weighs of combining differen ypes of feaures were adjused according o users feedback. Since he number of user-labeled images is small for supervised learning mehods, Huang e al. [27] proposed probabilisic hypergraph ranking under he semi-supervised learning framework. I uilized boh labeled and unlabeled images in he learning procedure. Relevance feedback required more users effor. For a web-scale commercial sysem users feedback has o be limied o he minimum, such as one-click feedback. 2. The similar images funcion of hp:// In order o reduce users burden, pseudo relevance feedback [28], [29] expanded he query image by aking he op N images visually mos similar o he query image as posiive examples. However, due o he wellknown semanic gap, he op N images may no be all semanically-consisen wih he query image. This may reduce he performance of pseudo relevance feedback. Chum e al. [30] used RANSAC o verify he spaial configuraions of local visual feaures and o purify he expanded image examples. However, i was only applicable o objec rerieval. I required users o draw he image region of he objec o be rerieved and assumed ha relevan images conained he same objec. Under he framework of pseudo relevance feedback, Ah- Pine e al. [31] proposed rans-media similariies which combined boh exual and visual feaures. Krapac e al. [32] proposed he query-relaive classifiers, which combined visual and exual informaion, o re-rank images rerieved by an iniial ex-only search. However, since users were no required o selec query images, he users inenion could no be accuraely capured when he semanic meanings of he query keywords had large diversiy. A preliminary sudy of combining ex and image conen for image search on he Inerne was conduced in [33], where simple visual feaures and clusering algorihms were used. Following our inen image search work in [2] and [3], a visual query suggesion mehod is developed in [34]. Is difference from [2] and [3] is ha insead of asking he user o click on a query image for re-ranking, he sysem asks users o click on a lis of keyword-image pairs generaed off-line using a daase from Flickr and search images on he web based on he seleced keyword. The problem wih his approach is ha on one hand he daase from Flickr is oo small compared wih he enire Inerne hus canno cover he unlimied possibiliy of Inerne images and on he oher hand, he keyword-image suggesions for any inpu query are generaed from he millions of images of he whole daase, hus are expensive o compue and may produce a large number of unrelaed keywordimage pairs. Besides visual query expansion, some approaches [35], [36] used concep-based query expansions hrough mapping exual query keywords or visual query examples o high-level semanic conceps. They needed a pre-defined concep lexicons whose deecors were off-line learned from fixed raining ses. These approach were suiable for closed daabases bu no for web-based image search, since he limied number of conceps canno cover he numerous images on he Inerne. The idea of learning example specific visual similariy meric was explored in previous work [37], [38]. However, hey required raining a specific visual similariy for every example in he image pool, which is assumed o be fixed. This is impracical in our applicaion where he image pool reurned by ex-based search consanly changes for differen query keywords. Moreover, ex informaion,

4 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY which can significanly improve visual similariy learning, was no considered in previous work. 2.2 Keyword Expansion In our approach, keyword expansion is used o expand he rerieved image pool and o expand posiive examples. Keyword expansion was mainly used in documen rerieval. Thesaurus-based mehods [39], [40] expanded query keywords wih heir linguisically relaed words such as synonyms and hypernyms. Corpus-based mehods, such as well known erm clusering [41] and Laen Semanic Indexing [42], measured he similariy of words based on heir co-occurrences in documens. Words mos similar o he query keywords were chosen as exual query expansion. Some image search engines have he feaure of expanded keywords suggesion. They mosly use surrounding ex. Some algorihms [43], [44] generaed ag suggesions or annoaions based on visual conen for inpu images. Their goal is no o improve he performance of image reranking. Alhough hey can be viewed as opions of keyword expansions, some difficulies preven hem from being direcly applied o our problem. Mos of hem assumed fixed keyword ses, which are hard o obain for image re-ranking in he open and dynamic web environmen. Some annoaion mehods required supervised raining, which is also difficul for our problem. Differen han image annoaion, our mehod provides exra image clusers during he procedure of keyword expansions, and such image clusers can be used as visual expansions o furher improve he performance of image re-ranking. 3 METHOD 3.1 Overview The flowchar of our approach is shown in Figure 3. The user firs submis query keywords q. A pool of images is rerieved by ex-based search 3 (Figure 3a). Then he user is asked o selec a query image from he image pool. The query image is classified as one of he predefined adapive weigh caegories. Images in he pool are reranked (Figure 3b) based on heir visual similariies o he query image and he similariies are compued using he weigh schema (Figure 3c described in Secion 3.3) specified by he caegory o combine visual feaures (Secion 3.2). In he keyword expansion sep (Figure 3d described in Secion 3.4), words are exraced from he exual descripions (such as image file names and surrounding exs in he hml pages) of op k images mos similar o he query image, and he f-idf mehod [45] is used o rank hese words. To save compuaional cos, only op m words are reserved as candidaes for furher processing. However, because he iniial image re-ranking resul is sill ambiguous and noisy, he op k images may have a large diversiy of semanic meanings, and canno 3. In his paper, i is from Bing image search [4]. be used as visual query expansion. The word wih he highes f-idf score compued from he op k images is no reliable o be chosen as keyword expansion eiher. In our approach, reliable keyword expansions are found hrough furher image clusering. For each candidae word w i, we find all he images conaining w i, and group hem ino differen clusers {c i,1,c i,2,,c i,i } based on visual conen. As shown in Figure 3d, images wih he same candidae word may have a large diversiy in visual conen. Images assigned o he same cluser have higher semanic consisency since hey have high visual similariy o one anoher and conain he same candidae word. Among all he clusers of differen candidae words, cluser c i,j wih he larges visual similariy o he query image is seleced as visual query expansion (Figure 3d described in Secion 3.5), and is corresponding word w i is seleced o form keyword expansion q = q + w i. A query specific visual similariy meric (Secion 3.5) and a query specific exual similariy meric (Secion 3.7) are learned from boh he query image and he visual query expansion. The image pool is enlarged hrough combining he original image pool rerieved by he query keywords q provided by he user and an addiional image pool rerieved by he expanded keywords q (Figure 3f described in Secion 3.6). Images in he enlarged pool are re-ranked using he learned query-specific visual and exual similariy merics (Figure 3g). The size of he image cluser seleced as visual query expansion and is similariy o he query image indicae he confidence ha he expansion capures he user s search inenion. If hey are below cerain hresholds, expansion is no used in image re-ranking. 3.2 Visual Feaure Design We design and adop a se of feaures ha are boh effecive in describing he visual conen of images from differen aspecs, and efficien in heir compuaional and sorage complexiy. Some of hem are exising feaures proposed in recen years. Some new feaures are firs proposed by us or exensions of exising feaures. I akes an average of 0.01ms o compue he similariy beween wo feaures on a machine of 3.0GHz CPU. The oal space o sore all feaures for an image is 12KB. More advanced visual feaures developed in recen years or in he fuure can also be incorporaed ino our framework Exising Feaures Gis. Gis [11] characerizes he holisic appearance of an image, and works well for scenery images. SIFT. We adop 128-dimension SIFT [13] o describe regions around Harris ineres poins. SIFT descripors are quanized according o a codebook of 450 words. Daubechies Wavele. We use he 2nd order momens of wavele coefficiens in various frequency

Expansion (g) Re-ranking Resul green ree Images Rerieved by Original Keywords (apple) Query Image C m,1 Images Rerieved by Expanded Keywords (green apple) General Objec (c) Adapive Weighing Schemes

bands (DWave) o characerize he exure properies in he image [46]. Hisogram of Gradien (HoG).

5 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY (a) Tex-based Search Resul of Query apple (b) Re-ranking wihou Query Expansion (d) Keyword Expansion Exraced Candidae Words Image Re-ranking Wih Visual Expansion (e) Visual Query Expansion (g) Re-ranking Resul green ree Images Rerieved by Original Keywords (apple) Query Image C m,1 Images Rerieved by Expanded Keywords (green apple) General Objec (c) Adapive Weighing Schemes Objec wih Simple Scene Porrai People Background Image Clusers (f) Image Pool Expansion Fig. 3. An example o illusrae our algorihm. The deails of seps (c)-(f) are given in Secion bands (DWave) o characerize he exure properies in he image [46]. Hisogram of Gradien (HoG). HoG [12] reflecs disribuions of edges over differen pars of an image, and is especially effecive for images wih srong long edges New Feaures Aenion Guided Color Signaure. Color signaure [47] describes he color composiion of an image. Afer clusering colors of pixels in he LAB color space, cluser ceners and heir relaive proporions are aken as he signaure. We propose a new Aenion Guided Color Signaure (ASig) as a color signaure ha accouns for varying imporance of differen pars of an image. We use an aenion deecor [48] o compue a saliency map for he image, and hen perform k-means clusering weighed by his map. The disance beween wo ASigs can be calculaed efficienly using he Earh Mover Disance algorihm [47]. Color Spaiale. We design a novel feaure, Color Spaiale, o characerize he spaial disribuion of colors. An image is divided ino n n paches. Wihin each pach, we calculae is main color as he larges cluser afer k-means clusering. The image is characerized by Color Spaiale (CSpa), a vecor of n 2 color values. In our experimens, we ake n =9. We accoun for some spaial shifing and resizing of objecs in he images when calculaing he disance of wo CSpas A and B: n n d(a, B) = min [d(a i,j,b i±1,j±1 )], i=1 j=1 where A i,j denoes he main color of he (i, j)h block in he image. Color Spaiale describes color spaial configuraion. By capuring only he main color, i s robus o sligh color changes due o lighing, whie balance, and imaging noise. Since local shif is allowed when calculaing disance beween wo Color Spaiales, he feaure is shif-invarian o some exend and has resilience o misalignmen. Muli-Layer Roaion Invarian EOH. Edge Orienaion Hisogram (EOH) [49] describes he hisogram of edge orienaions. We incorporae roaion invariance when comparing wo EOHs, roaing one of hem o bes mach he oher. This resuls in a Muli-Layer Roaion Invarian EOH (MRI-EOH). Also, when calculaing MRI-EOH, a hreshold parameer is required o filer ou he weak edges. We use muliple hresholds o ge muliple EOHs o characerize image edge disribuions a differen scales. Facial Feaure. Face exisence and heir appearances give clear semanic inerpreaions of he image. We apply face deecion algorihm [50] o each image, and obain he number of faces, face sizes and posiions as feaures o describe he image from a facial perspecive. 3.3 Adapive Weigh Schema Human can easily caegorize images ino high level semanic classes, such as scene, people, or objec. We observed ha images inside hese caegories usually agree on he relaive imporance of feaures for similariy calculaions. Inspired by his observaion, we assign he query images ino several ypical caegories, and adapively adjus feaure weighs wihin each caegory. Suppose an image i from query caegory Q q is characerized using F visual feaures, he adapive similariy beween image i and j is defined as s q (i, j) = F α q ms m (i, j), where s m (i, j) is he similariy beween m=1 image i and j on feaure m, and α q m expresses he imporance of feaure m for measuring similariies for query images from caegory Q q. We furher consrain α q m 0 and α q m =1. m Query Caegorizaion The query caegories we considered are: General Objec, Objec wih Simple Background, Scenery Images, Porrai, and People. We use 500 manually labeled images, 100 for each caegory, o rain a C4.5 decision ree for query caegorizaion.

6 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY The feaures we used for query caegorizaion are: exisence of faces, he number of faces in he image, he percenage of he image frame aken up by he face region, he coordinae of he face cener relaive o he cener of he image, Direcionaliy (Kurosis of Edge Orienaion Hisogram, Secion3.2), Color Spaial Homogeneousness (variance of values in differen blocks of Color Spaiale, Secion3.2), oal energy of edge map obained from Canny operaor, and Edge Spaial Disribuion (he variance of edge energy in a 3 3 regular block of he image, characerizing wheher edge energy is mainly disribued a he image cener.) Feaure Fusion In each query caegory Q q, we pre-rain a se of opimal weighs α q m based on he RankBoos framework [51]. For a query image i, a real valued feedback funcion Φ i (j, k) is defined o denoe preference beween image j and k. We se Φ i (j, k) > 0 if image k should be ranked above image j, and 0 oherwise. The feedback funcion Φ i induces a disribuion over all pairs of images (j, k): D i (j, k) = Φi(j,k), and he ranking loss Φ i(j,k) j,k for query image i using similariy measuremen s q (i, ) is L i =Pr (j,k) Di [s q (i, k) s q (i, j)]. We use an adapaion of he RankBoos framework (Algorihm 1) o minimize he loss funcion wih respec o α q m, which is he opimal weigh schema for caegory Q q. Seps 2, 3, 4 and 8 differs from he original RankBoos algorihm o accommodae o our applicaion, and are deailed as follows: Algorihm 1 Feaure weigh learning for a cerain query caegory 1. Inpu: Iniial weigh D i for all query images i in he curren inenion caegory Q q, similariy marices s m (i, ) for all query image i and feaure m; 2. Iniialize: Se sep =1, se Di 1 = D i for all i; while no converged do for each query image i Q q do 3. Selec bes feaure m and he corresponding similariy s m (i, ) for curren re-ranking problem under weigh Di ; 4. Calculae ensemble weigh α according o Equaion 1; 5. Adjus weigh D +1 i (j, k) Di (j, k)exp{α [s m (i, j) s m (i, k)]}; 6. Normalize Di +1 o make i a disribuion; 7. ++; end for end while 8. Oupu: Final opimal similariy measure for curren inenion caegory: s q (, ) = weigh for feaure m: a q m = α s m (, ) α α m =m. α, and he Sep 2: Iniializaion. The raining images are caegorized ino he five main classes. Images wihin each main class are furher caegorized ino sub-classes. Images in each sub-class are visually similar. Besides, a few images are labeled as noise (irrelevan images) or neglec (hard o judge relevancy). Given a query image i, we define 4 image ses: S1 i includes images wihin he same subclass as i, S2 i includes images wihin he same main class as i, excluding hose in S1 i, S3 i includes images labeled as neglec, and S4 i includes images labeled as noise. For any image j S1 i and any image k S2 i S4 i,we se Φ(k, j) =1. In all oher cases, we se Φ(k, j) =0. Sep 3: Selec bes feaure. We need o selec a feaure ha performs bes under curren weigh Di for query image i. This sep is very efficien since we consrain our weak ranker o be one of he F similariy measuremens s m (, ). The bes feaure m is found by enumeraing all F feaures. Sep 4: Calculae ensemble weigh. I is proven ( ) ( ) in [51] ha minimizing Ẑ 1 r = 2 e α 1+r + 2 e α in each sep of boosing is approximaely equivalen o minimizing he upper bound of he rank loss, where r = Di (j, k)[s m (i, k) s m (i, j)]. Since we are look- j,k ing for a single weighing scheme for each caegory, variaions in α obained for differen query images are penalized by an addiional ( smoohness ) ( erm. ) The objecive becomes Ẑ 1 r = 2 e α 1+r + 2 e α + ( λ 2 e α α 1 + e α 1 α ), where λ is a hyperparameer o balance he new erm and he old erms. In our implemenaion, λ =1is used. Noe ha he hird erm akes minimum value if and only if α = α 1,soby imposing his new erm, we are looking for a common α q m for all query images i in curren inenion caegory, while rying o reduce all he losses L i,i Q q. Leing Ẑ α =0, we know ha Ẑ is minimized when ( ) α = 1 2 ln 1+r + e α 1 (1) 1 r + e α 1 Sep 8: Oupu final weigh for feaure fusion. The final oupu of he new RankBoos algorihm is a linear combinaion of all he base rankers generaed in each sep. However, since here are acually F base rankers, he oupu is equivalen o a weighed combinaion of he F similariy measuremens. 3.4 Keyword Expansion Once he op k images mos similar o he query image are found according o he visual similariy meric inroduced in Secion 3.2 and 3.3, words from heir exual descripions 4 are exraced and ranked, using he erm frequency-inverse documen frequency (f-idf) [45] mehod. The op m (m =5in our experimens) words are reserved as candidaes for query expansion. 4. The file names and he surrounding exs of images.

JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 7 Fig. 4. An example of conen-based image ranking resul wih many irrelevan images among op ranked images.

7 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY Fig. 4. An example of conen-based image ranking resul wih many irrelevan images among op ranked images. The query keyword is palm and he query image is he op lefmos image of palm ree. Fig. 5. Examples of images conaining he same word palm ree bu wih differen visual conen. Because of he semanic diversiy of he op k images, he word wih he highes f-idf score may no capure he user s search inenion. Some image annoaion algorihms uilized he visual conen of he op k images for word expansion. For insance, Wang e al. [44] gave each image i a weigh weigh(i) according o is visual disance d(i) o he query image, weigh(i) = 1 2πσ 2 e d2 (i)/2σ 2, (2) and he scores of words were calculaed as weighed sum of f-idf values. If here are many irrelevan images among he op k images, he performance of hese mehods is degraded. Figure 4 shows such an example. The query keyword is palm and he query image is he op lefmos image of palm ree. Is op ranked images using he adapive weigh schema are shown from lef o righ and from op o boom. They include images of palm ree (marked by blue recangles), palm reo (marked by red recangles), palm leaves and palm reading. There are more images of palm reo han hose of palm ree, and some images of palm ree are ranked in low posiions. Thus he word reo ges he highes score calculaed eiher by f-idf values or fidf value weighed by visual disance. We do keyword expansion hrough image clusering. For each candidae word w i, all he images conaining w i in he image pool are found. However, hey canno be direcly used as he visual represenaions of w i for wo reasons. Firs, here may be a number of noisy images irrelevan o w i. Second, even if hese images are relevan o w i semanically, hey may have quie differen visual conen. Figure 5 shows such an example. In order o find images wih similar visual conen as he query example and remove noisy images, we divide hese images ino differen clusers using k-means. The number of clusers is empirically se o be n/6, where n is he number of images o cluser. Each word w i has i clusers C(w i )={c i,1,,c i,i }. The visual disance beween he query image and a cluser c is calculaed as he mean of he disances beween he query image and he images in c. The cluser c i,j wih he minimal disance is chosen as visual query expansion and is corresponding word w i, combined wih he original keyword query q, is chosen as keyword expansion q. See he example in Figure 3. If he disance beween he closes cluser and he query image is larger han a hreshold ρ, i indicaes ha here is no suiable image cluser and word o expand he query, and hus query expansion will no be used. 3.5 Visual Query Expansion So far we only have one posiive image example which is he query image. The goal of visual query expansion is o obain muliple posiive example images o learn a visual similariy meric which is more robus and more specific o he query image. An example in Figure 6 explains he moivaion. The query keyword is Paris and he query image is an image of eiffel ower. The image re-ranking resul based on visual similariies wihou visual expansion is shown in Figure 6(a) and here are many irrelevan images among he op-ranked images. This is because he visual similariy meric learned from one query example image is no robus enough. By adding more posiive examples o learn a more robus similariy meric, such irrelevan images can be filered ou. In a radiional way, adding addiional posiive examples was ypically done hrough relevance feedback, which required more users labeling burden. We aim a developing an image re-ranking mehod which only requires one-click on he query image and hus posiive examples have o be obained auomaically. The cluser of images chosen in Secion 3.4 have he closes visual disance o he query example and have consisen semanic meanings. Thus hey are used as addiional posiive examples for visual query expansion. We adop he one-class SVM [22] o refine he visual similariy in Secion 3.3. The one-class SVM classifier is rained from he addiional posiive examples obained by visual query expansion. I requires defining he kernel beween images, and he kernel is compued from he similariy inroduced in Secion 3.3. An image o be re-ranked is inpu o he one-class SVM classifier and he oupu is used as he similariy (sim V ) o he query image. Noice, he effec of his sep is similar o relevance feedback [52]. However, he key difference is ha insead of asking users o add he posiive samples manually, our mehod is fully auomaic. 3.6 Image Pool Expansion Considering efficiency, image search engines, such as Bing image search, only re-rank he op N images of he

JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 8 visual query expansion and combining i wih he query specific visual similariy meric inroduced in Secion 3.

8 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY visual query expansion and combining i wih he query specific visual similariy meric inroduced in Secion 3.5 can furher improve he performance of image reranking. For a seleced query image, a word probabiliy model is rained from E and used o compue he exual disance dis T. We adop he approach in [31]. Le θ be he parameer of a discree disribuion of words over he dicionary. Each image i is regarded as a documen d i where he words are exraced from is exual descripions (see definiion in 3.4). θ is learned by maximizing he observed probabiliy Π ei EΠ w di (λp(w θ)+(1 λ)p(w C)) ni w, where λ is a fixed parameer se o be 0.5, w is a word, and n i w is he frequency of w in d i. p(w C) is he word probabiliy buil upon he whole reposiory C: d p(w C) = i n i w. C θ can be learned by he Expecaion-Maximizaion algorihm. Once θ is learned, for an image k is exual disance o he posiive examples is defined by crossenropy funcion: dis T (k) = w p(w d k ) log(w θ). Here p(w d i )=n k w/ d k. A las, his exual disance can be combined wih he visual similariy sim V obained in Secion 3.5 o re-rank images: α sim V (k)+(1 α) dis T (k) Fig. 6. An example of image re-ranking using Paris as query keyword and an image of eiffel ower as query image. Irrelevan images are marked by red recangles. ex-based image search resul. If he query keywords do no capure he user s search inenion accuraely, here are only a small number of relevan images wih he same semanic meanings as he query image in he image pool. This can significanly degrade he ranking performance. In Secion 3.3, we re-rank he op N rerieved images by he original keyword query based on heir visual similariies o he query image. We remove he N/2 images wih he lowes ranks from he image pool. Using he expanded keywords as query, he op N/2 rerieved images are added o he image pool. We believe ha here are more relevan images in he image pool wih he help of expanded query keywords. The re-ranking resul by exending image pool and posiive example images is shown in Figure 6(b), which is significanly improved compared wih Figure 6(a). 3.7 Combining Visual and Texual Similariies Learning a query specific exual similariy meric from he posiive examples E = {e 1,,e j } obained by α is a fixed parameer and se as Summary The goal of he proposed framework is o capure user inenion and is achieved in muliple seps. The user inenion is firs roughly capured by classifying he query image ino one of he coarse semanic caegories and choosing a proper weigh schema accordingly. The adapive visual similariy obained from he seleced weigh schema is used in all he following seps. Then according o he query keywords and he query image provided by he user, he user inenion is furher capured in wo aspecs: (1) finding more query keywords (called keyword expansion) describing user inenion more accuraely; (2) and in he meanwhile finding a cluser of images (called visual query expansion) which are boh visually and semanically consisen wih he query image. They keyword expansion frequenly cooccurs wih he query keywords and he visual expansion is visually similar o he query image. Moreover, i is required ha all he images in he cluser of visual query expansion conain he same keyword expansion. Therefore, he keyword expansion and visual expansion suppor each oher and are obained simulaneously. In he laer seps, he keyword expansion is used o expand he image pool o include more images relevan o user

9 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY inenion, and he visual query expansion is used o learn visual and exual similariy merics which beer reflec user inenion. 59% 53% (1) Tex-based (2) GW (3) AW (4) ExEg (5) GW + ExPool (6) ExPool (7) ExBoh(V) (8) ExBoh(V+T) 4 EXPERIMENTAL EVALUATION In he firs experimen, 300, 000 web images are manually labeled ino differen classes (images ha are semanically similar) as ground ruh. Precisions of differen approaches are compared. The semanic meanings of images are closely relaed o users inenion. However, hey are no exacly he same. Images of he same class (hus wih similar semanic meanings) can be visually quie differen. Thus, a user sudy is conduced in he second experimen o evaluae wheher he search resuls well capure users inenion. Running on a machine of 3GHz CPU and wihou opimizing he code, i needs less han half second compuaion for each query. 4.1 Experimen One: Evaluaion wih Ground Truh Fify query keywords are chosen for evaluaion. Using each keyword as query, he op 1000 images are crawled from Bing image search. These images are manually labeled ino differen classes. For example, for query apple, is images are labeled as red apple, apple ipod, and apple pie ec. There are oally 700 classes for all he fify query keywords. Anoher 500 images are crawled from Bing image search using each keyword expansion as query. These images are also manually labeled. There are oally around 300, 000 images in our daa se. A small porion of hem are as ouliers and no assigned o any caegory (e.g, some images are irrelevan o he query keywords). The hreshold ρ (in Secion 3.4) is chosen as 0.3 hrough cross-validaion measuring and is fixed in all he experimens. The performance is sable when ρ varies beween 0.25 and Precisions on Differen Seps of Our Framework Top m precision, he proporion of relevan images among he op m ranked images, is used o evaluae he performance of image re-ranking. Images are considered o be relevan if hey are labeled as he same class. For each query keyword, image re-ranking repeas for many imes by choosing differen query images. Excep hose oulier images no being assigned o any caegory, every image reurned by keyword query has been chosen as he query image. In order o evaluae he effeciveness of differen seps of our proposed image re-ranking framework, we compare he following approaches. From (1) o (8), more and more seps in Figure 3 are added in. (1) Tex-based: ex-based search from Bing. I is used as he baseline. (2) GW: image re-ranking using global weighs o combine visual feaures (Global Weigh). (3) AW: image re-ranking using adapive weigh schema o combine visual feaures (Secion 3.3). Top m Precision 47% 41% 35% 29% 23% 17% m Fig. 7. Comparison of averaged op m precisions on differen seps. (4) ExEg: image re-ranking by exending posiive examples only, from which he query specific visual similariy meric is learned and used (Secion 3.5). (5) GW+Pool: image re-ranking by exending he image pool only (Secion 3.6) while using global weighs o combine visual feaures. (6) ExPool: similar o GW+Pool, however, using adapive weigh schema o combine visual feaures; (7) ExBoh(V): image re-ranking by exending boh he image pool and posiive example images. Only he query specific visual similariy meric is used. (8) ExBoh(V+T): similar o ExBoh, however, combining query specific visual and exual similariy merics (Secion 3.7). This is he complee approach proposed by us. The averaged op m precisions are shown in Figure 7. Approaches (2)-(7) only use visual similariy. Approach (8) uses boh visual and exual similariies. Approaches (2) and (3) are iniial image re-ranking based on he exbased search resuls in (1). Their difference is o combine visual feaures in differen ways. We can see ha by using a single query image we can significanly improve he ex-based image search resul. The proposed adapive weigh schema, which reflecs user inenion a a coarse level, ouperforms he global weigh. Afer iniial re-ranking using adapive weigh, he op 50 precision of ex-based research is improved from 19.5% o 32.9%. Approaches (4), (6), (7) and (8) are based on he iniial image re-ranking resul in (3). We can clearly see he effeciveness of expanding image pool and expanding posiive image examples hrough keyword expansion and image clusering. These seps capure user inenion a a finer level since for every query image he image pool and posiive examples are expanded differenly. Using expansions he op 50 precision of iniial reranking using adapive weigh is improved from 32.9% o 51.9%. Our keyword expansion sep (Secion 3.4) could be replaced by oher equivalen mehods, such as image

expansions Top 10 images 4.26 2.33 Top 5 images 1.67 0.91 Top 3 images 0.74 0.43 7.82% 1.82% 39% 10 30 50 70 90 m Fig. 8.

10 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY Top m Precision 59% 55% 51% 47% 43% (1) ExBoh(V) (2) ExPool (3) ExPoolByTfIdf (4) ExPoolByWTfIdf TABLE 1 Average number of Irrelevan Images No visual expansions Wih visual expansions Top 10 images Top 5 images Top 3 images % 1.82% 39% m Fig. 8. Comparison of averaged op m precisions of keyword expansion hrough image clusering wih he oher wo mehods. Top m Precision 60% 52% 44% 36% (1) ExBoh(V+T) (2) CrossMedia (3) NPRF (4) PRF 28% m Fig. 9. Comparison of averaged op m precisions wih exising mehods. annoaion mehods. As discussed in Secion 2.2, many exising image annoaion mehods canno be direcly applied o our problem. We compare wih wo image annoaion approaches, which do no require a fixed se of keywords, by replacing he keyword expansion sep wih hem. One is o choose he word wih he highes f-idf score as keyword expansion o exend image pool (ExPoolByTfIdf). The oher uses he mehod proposed in [44] which weighed he f-idf score by images visual similariies o exend image pool (ExPoolByWTfIdf). Figure 8 shows he resul. Our ExPool has a beer performance. Moreover, image annoaion only aims a ranking words and canno auomaically provide visual query expansion as our mehod does. Therefore, our ExBoh has a even beer performance han he oher wo Comparison wih Oher Mehods In his secion, we compare wih several exising approaches [29], [31], [53] which can be applied o image re-ranking wih only one-click feedback as discussed in Secion 2.2. (1) ExBoh (V+T): our approach. (2) CrossMedia: image re-ranking by rans-media dis % 26.90% 40.88% Much Beer Somewha Beer Similar Somewha Worse Much Worse Fig. 10. Percenages of cases when users hink he resul of visual expansions is much beer, somewha beer, similar, somewha worse and much worse han he resul of only using inenion weigh schema. ances defined in [31]. I combined boh visual and exual feaures under he pseudo-relevance feedback framework. (3) NPRF: image re-ranking by he pseudo-relevance feedback approach proposed in [29]. I used op-ranked images as posiive examples and boom-ranked images as negaive examples o rain a SVM. (4) PRF: image re-ranking by he pseudo-relevance feedback approach proposed in [53]. I used op-ranked images as posiive examples o rain a one-class SVM. Figure 9 shows he resul. Our algorihm ouperforms ohers, especially when m is large. 4.2 Experimen Two: User Sudy The purpose of he user sudy is o evaluae he effeciveness of visual expansions (expanding boh he image pool and posiive visual examples) o capure user inenion. Fory users are invied. For each query keyword, he user is asked o browse he images and o randomly selec an image of ineres as a query example. We show hem he iniial image re-ranking resuls of using adapive weigh schema (Secion 3.3) and he resuls of exending boh he image pool and posiive example images. The users are asked o do he followings: Mark irrelevan images among he op 10 images. Compare he op 50 rerieved images by boh resuls, and choose wheher he final resul wih visual expansions is much beer, somewha beer, similar, somewha worse or much worse han ha of iniial re-ranking using adapive weigh schema.

11 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY Each user is assigned 5 query keywords from all he 50 keywords. Given each query keyword, he user is asked o choose 30 differen query images and compare heir re-ranking resuls. As shown in Table 1, visual expansions significanly reduce he average numbers of irrelevan images among op 10 images. Figure 10 shows ha in mos cases (> 67%) he users hink visual expansions improve he resul. 4.3 Discussion In our approach, he keyword expansion (Secion 3.4), visual query expansion (Secion 3.5) and image pool expansion (Secion 3.6) all affec he qualiy of iniial image re-ranking resul (Secion 3.3). According o our experimenal evaluaion and user sudy, if he qualiy of iniial image re-ranking is reasonable, which means ha here are a few relevan examples among op ranked images, he following expansion seps can significanly improve he re-ranking performance. Inappropriae expansions which significanly deerioraes he performance happen in he cases when he iniial re-rank resul is very poor. The chance is lower han 2% according o our user sudy in Figure 10. In his paper, i is assumed ha an image capures user inenion when i is boh semanically and visually similar o he query image. However, in some cases user inenion canno by well expressed by a single query image. For insance, he user may be ineresed in only par of he image. In hose cases, more user ineracions, such as labeling he regions where he user hinks is imporan, have o be allowed. However, more user burden have o be added and i is no considered in his paper. 5 CONCLUSION In his paper, we propose a novel Inerne image search approach which only requires one-click user feedback. Inenion specific weigh schema is proposed o combine visual feaures and o compue visual similariy adapive o query images. Wihou addiional human feedback, exual and visual expansions are inegraed o capure user inenion. Expanded keywords are used o exend posiive example images and also enlarge he image pool o include more relevan images. This framework makes i possible for indusrial scale image search by boh ex and visual conen. The proposed new image reranking framework consiss of muliple seps, which can be improved separaely or replaced by oher echniques equivalenly effecive. In he fuure work, his framework can be furher improved by making use of he query log daa, which provides valuable co-occurrence informaion of keywords, for keyword expansion. One shorcoming of he curren sysem is ha someimes duplicae images show up as similar images o he query. This can be improved by including duplicae deecion in he fuure work. REFERENCES [1] F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W. Ma, Igroup: Web image search resuls clusering, in Proc. ACM Mulimedia, [2] J. Cui, F. Wen, and X. Tang, Real ime google and live image search re-ranking, in Proc. ACM Mulimedia, [3], Inensearch: Ineracive on-line image search re-ranking, in Proc. ACM Mulimedia, [4] Bing image search, hp:// [5] N. Ben-Haim, B. Babenko, and S. Belongie, Improving web-based image search via conen based clusering, in Proc. In l Workshop on Semanic Learning Applicaions in Mulimedia, [6] R. Fergus, P. Perona, and A. Zisserman, A visual caegory filer for google images, in Proc. European Conf. Compuer Vision, [7] G. Park, Y. Baek, and H. Lee, Majoriy based ranking approach in web image rerieval, in Proc. he 2nd Inernaional Conference on Image and Video Rerieval, [8] Y. Jing and S. Baluja, Pagerank for produc image search, in Proc. In l Conf. World Wide Web, [9] W. H. Hsu, L. S. Kennedy, and S.-F. Chang, Video search reranking via informaion boleneck principle, in Proc. ACM Mulimedia, [10] R. Daa, D. Joshi, and J. Z. Wang, Image rerieval: Ideas, influences, and rends of he new age, ACM Compuing Surveys, vol. 40, pp. 1 60, [11] A. Torralba, K. Murphy, W. Freeman, and M. Rubin, Conexbased vision sysem for place and objec recogniion, in Proc. In l Conf. Compuer Vision, [12] N. Dalal and B. Triggs, Hisograms of oriened gradiens for human deecion, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [13] D. Lowe, Disincive image feaures from scale-invarian keypoins, Inernaional Journal of Compuer Vision, vol. 60, no. 2, pp , [14] J. Sivic and A. Zisserman, Video google: a ex rerieval approach o objec maching in videos, in Proc. In l Conf. Compuer Vision, [15] Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang, Spaial-bagof-feaures, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [16] J. Philbin, M. Isard, J. Sivic, and A. Zisserman, Descripor Learning for Efficien Rerieval, in Proc. European Conf. Compuer Vision, [17] Y. Zhang, Z. Jia, and T. Chen, Image rerieval wih geomerypreserving visual phrases, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [18] G. Chechik, V. Sharma, U. Shali, and S. Bengio, Large scale online learning of image similariy hrough ranking, Journal of Machine Learning Research, vol. 11, pp , [19] J. Deng, A. C. Berg, and L. Fei-Fei, Hierarchical semanic indexing for large scale image rerieval, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [20] K. Tieu and P. Viola, Boosing image rerieval, Inernaional Journal of Compuer Vision, vol. 56, no. 1, pp , [21] S. Tong and E. Chang, Suppor vecor machine acive learning for image rerieval, in Proc. ACM Mulimedia, [22] Y. Chen, X. Zhou, and T. Huang, One-class SVM for learning in image rerieval, in Proc. IEEE In l Conf. Image Processing, [23] Y. Lu, H. Zhang, L. Wenyin, and C. Hu, Join semanics and feaure based image rerieval using relevance feedback, IEEE Trans. on Mulimedia, vol. 5, no. 3, pp , [24] D. Tao and X. Tang, Random sampling based svm for relevance feedback image rerieval, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [25] D. Tao, X. Tang, X. Li, and X. Wu, Asymmeric bagging and random subspace for suppor vecor machines-based relevance feedback in image rerieval, IEEE Trans. on Paern Analysis and Machine Inelligence, vol. 28, pp , [26] T. Quack, U. Monich, L. Thiele, and B. Manjunah, Corina: a sysem for large-scale, conen-based web image rerieval, in Proc. ACM Mulimedia, [27] Y. Huang, Q. Liu, S. Zhang, and D. N. Meaxas, Image rerieval via probabilisic hypergraph ranking, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, 2011.

JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 12 [28] R. Yan, E. Haupmann, and R. Jin, Mulimedia search wih pseudo-relevance feedback, in Proc. In l Conf.

Sivic, M. Isard, and A. Zisserman, Toal recall: Auomaic query expansion wih a generaive feaure model for objec rerieval, in Proc. In l Conf. Compuer Vision, 2007. [31] J. Ah-Pine, M. Bressan, S.

Allan, J. Verbeek, and F. Jurie, Improving web image search resuls using query-relaive classifiers, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, 2010. [33] B. Luo, X. Wang, and X.

Wang, Visual query expansion, in Proc. ACM Mulimedia, 2009. [35] A. Nasev, A. Haubold, J. Tešić, L. Xie, and R.

12 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY [28] R. Yan, E. Haupmann, and R. Jin, Mulimedia search wih pseudo-relevance feedback, in Proc. In l Conf. on Image and Video Rerieval, [29] R. Yan, A. G. Haupmann, and R. Jin, Negaive pseudo-relevance feedback in conen-based video rerieval, in Proc. ACM Mulimedia, [30] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, Toal recall: Auomaic query expansion wih a generaive feaure model for objec rerieval, in Proc. In l Conf. Compuer Vision, [31] J. Ah-Pine, M. Bressan, S. Clinchan, G. Csurka, Y. Hoppeno, and J. Renders, Crossing exual and visual conen in differen applicaion scenarios, Mulimedia Tools and Applicaions, vol. 42, pp , [32] J. Krapac, M. Allan, J. Verbeek, and F. Jurie, Improving web image search resuls using query-relaive classifiers, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [33] B. Luo, X. Wang, and X. Tang, A world wide web based image search engine using ex and image conen feaures, in Proc. IS&T/SPIE Elecronic Imaging, Inerne Imaging IV., [34] Z. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang, Visual query expansion, in Proc. ACM Mulimedia, [35] A. Nasev, A. Haubold, J. Tešić, L. Xie, and R. Yan, Semanic concep-based query expansion and re-ranking for mulimedia rerieval, in Proc. ACM Mulimedia, [36] J. Smih, M. Naphade, and A. Nasev, Mulimedia semanic indexing using model vecors, in Proc. In l Conf. Mulimedia and Expo., [37] A. Frome, Y. Singer, F. Sha, and J. Malik, Learning globallyconsisen local disance funcions for shape-based image rerieval and classificaion, in Proc. In l Conf. Compuer Vision, [38] Y. Lin, T. Liu, and C. Fuh, Local ensemble kernel learning for objec caegory recogniion, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [39] S. Liu, F. Liu, C. Yu, and W. Meng, An effecive approach o documen rerieval via uilizing WordNe and recognizing phrases, in Proc. ACM Special Ineres Group on Informaion Rerieval, [40] S. Kim, H. Seo, and H. Rim, Informaion rerieval using word senses: roo sense agging approach, in Proc. ACM Special Ineres Group on Informaion Rerieval, [41] K. Sparck Jones, Auomaic keyword classificaion for informaion rerieval. Archon Books, [42] S. Deerweser, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, Indexing by laen semanic analysis, Journal of he American Sociey for Informaion Science, vol. 41, no. 6, pp , [43] L. Wu, L. Yang, N. Yu, and X. Hua, Learning o ag, in Proc. In l Conf. World Wide Web, [44] C. Wang, F. Jing, L. Zhang, and H. Zhang, Scalable searchbased image annoaion of personal images, in Proc. he 8h ACM Inernaional Workshop on Mulimedia Informaion Rerieval, [45] R. Baeza-Yaes and B. Ribeiro-Neo, Modern Informaion Rerieval. Addison-Wesley Longman Publishing Co., Inc., [46] M. Unser, Texure classificaion and segmenaion using wavele frames, IEEE Trans. on Image Processing, vol. 4, no. 11, pp , [47] Y. Rubner, L. Guibas, and C. Tomasi, The earh movers disance, muli-dimensional scaling, and color-based image rerieval, in Proc. he ARPA Image Undersanding Workshop, [48] T. Liu, J. Sun, N. Zheng, X. Tang, and H. Shum, Learning o deec a salien objec, in Proc. IEEE In l Conf. Compuer Vision and Paern Recogniion, [49] W. Freeman and M. Roh, Orienaion hisograms for hand gesure recogniion, in Proc. In l Workshop on Auomaic Face and Gesure Recogniion, [50] R. Xiao, H. Zhu, H. Sun, and X. Tang, Dynamic cascades for face deecion, in Proc. In l Conf. Compuer Vision, [51] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, An efficien boosing algorihm for combining feaures, Journal of Machine Learning Research, vol. 4, pp , [52] X. S. Zhou and T. S. Huang, Relevance feedback in image rerieval: A comprehensive review, Mulimedia Sysems, vol. 8, pp , [53] J. He, M. Li, Z. Li, H. Zhang, H. Tong, and C. Zhang, Pseudo relevance feedback based on ieraive probabilisic one-class svms in web image rerieval, in Proc. Pacific-Rim Conference on Mulimedia, Xiaoou Tang (S 93-M 96-SM 02-F 09) received he B.S. degree from he Universiy of Science and Technology of China, Hefei, in 1990, and he M.S. degree from he Universiy of Rocheser, Rocheser, NY, in He received he Ph.D. degree from he Massachuses Insiue of Technology, Cambridge, in He is a Professor in he Deparmen of Informaion Engineering and Associae Dean (Research) of he Faculy of Engineering of he Chinese Universiy of Hong Kong. He worked as he group manager of he Visual Compuing Group a he Microsof Research Asia from 2005 o His research ineress include compuer vision, paern recogniion, and video processing. Dr. Tang received he Bes Paper Award a he IEEE Conference on Compuer Vision and Paern Recogniion (CVPR) He is a program chair of he IEEE Inernaional Conference on Compuer Vision (ICCV) 2009 and an Associae Edior of IEEE Transacions on Paern Analysis and Machine Inelligence (PAMI) and Inernaional Journal of Compuer Vision (IJCV). He is a Fellow of IEEE. Ke Liu received he B.S. degree from Tsinghua Universiy in Compuer Science in He is currenly an M.Phil. suden in he Deparmen of Informaion Engineering a he Chinese Universiy of Hong Kong. His research ineress include image search and compuer vision. Jingyu Cui received his MSc degree in Elecrical Engineering in 2010 from Sanford Universiy, where he is currenly working oward his PhD degree. He received his Msc and BEng degree in Auomaion from Tsinghua Universiy in 2008 and 2005, respecively. His research ineress include visual recogniion and rerieval, machine learning, and parallel compuing. Fang Wen received he Ph.D and M.S. degree in Paern Recogniion and Inelligen Sysem and B.S degree in Auomaion from Tsinghua Universiy in 2003, 1997, respecively. Now she is a researcher of visual compuing group a Microsof Research Asia. Her research ineress include compuer vision, paern recogniion and mulimedia search.

JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 13 Xiaogang Wang (S 03-M 10) received he B.S. degree from Universiy of Science and Technology of China in Elecrical Engineering and Informaion Science in 2001, and he M.

13 JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY Xiaogang Wang (S 03-M 10) received he B.S. degree from Universiy of Science and Technology of China in Elecrical Engineering and Informaion Science in 2001, and he M.S. degree from Chinese Universiy of Hong Kong in Informaion Engineering in He received he PhD degree in Compuer Science from he Massachuses Insiue of Technology. He is currenly an assisan professor in he Deparmen of Elecronic Engineering a he Chinese Universiy of Hong Kong. His research ineress include compuer vision and machine learning.

A Matching Algorithm for Content-Based Image Retrieval

A Matching Algorithm for Content-Based Image Retrieval A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using