Improving Web Search Results Using Affinity Graph

Size: px

Start display at page:

Download "Improving Web Search Results Using Affinity Graph"

Sheena Bryan
6 years ago
Views:

1 Improvng Web Search Results Usng Affnty Graph Benyu Zhang, Hua L 2, Y Lu 3, Le J 4, Wens X 5, Weguo Fan 5, Zheng Chen, We-Yng Ma Mcrosoft Research Asa, 49 Zhchun Road, Bejng, 00080, P. R. Chna {byzhang, zhengc, wyma}@mcrosoft.com 2 LMAM, School of Mathematcal Scence, Pekng Unversty, Bejng, 0087, P. R. Chna lhua@math.pku.edu.cn 3 Department of Computer Scence and Engneerng, Mchgan State Unversty, East Lansng, MI 48824, USA luy3@cse.msu.edu 4 Department of Computer Scence, Bejng Insttute of Technology, Bejng, 0087, P. R. Chna jle03@bt.edu.cn 5 Vrgna Polytechnc Insttute and State Unversty, Blacksburg, VA 24060, USA {xwens, wfan}@vt.edu ABSTRACT In ths paper, we propose a novel rankng scheme named Affnty Rankng (AR) to re-rank search results by optmzng two metrcs: () dversty -- whch ndcates the varance of topcs n a group of documents; (2) nformaton rchness -- whch measures the coverage of a sngle document to ts topc. Both of the two metrcs are calculated from a drected lnk graph named Affnty Graph (AG). AG models the structure of a group of documents based on the asymmetrc content smlartes between each par of documents. Expermental results n Yahoo! Drectory, ODP Data, and Newsgroup data demonstrate that our proposed rankng algorthm sgnfcantly mproves the search performance. Specfcally, the algorthm acheves 3% mprovement n dversty and 2% mprovement n nformaton rchness relatvely wthn the top 0 search results. Categores and Subject Descrptors H.3.3 [Informaton Storage and Retreval]: Informaton Search and Retreval retreval models, search process; H.2.8 [Database Management]: Database Applcatons Data Mnng General Terms: Algorthms, Performance Keywords: Affnty Rankng, Informaton Retreval, Lnk Analyss, Dversty, and Informaton Rchness. INTRODUCTION Most current web search engnes tend to provde a lst of search results to users queres accordng to the relevance score of each document to the query. Ths paradgm s very useful when users nformaton needs (represented by the queres) are clear and they care more about precson than recall n the returned results. Unfortunately, many of the queres presented to a web search engne nowadays are ambguous [5] and the user s actual Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR 05, August 5 9, 2005, Salvador, Brazl. Copyrght 2005 ACM /05/ $5.00. nformaton needs are unknown. Users may suffer from the vast number of redundant and yet not very relevant documents that are related to a few most popular topcs lsted n the top of search results. Such search experence often makes users frustrated. Several approaches have been proposed to mprove such stuaton. Carbonell et al [3] proposed a re-rankng method based on maxmal margnal relevance crteron to reduce redundancy whle mantanng query relevance n re-ranked documents. A margnal relevance of a document s defned as the relevance wth a query mnus that of prevously selected documents. Maxmzng ths margnal relevance wll help acheve a low redundancy n a group of documents. But there s no drect crteron about dversty evaluaton to ensure that the group of documents wth low redundancy can acheve large topc coverage. Recently proposed subtopc retreval method [8] s another useful approach to mprove the hgh redundancy search result. Dfferent from Carbonell s work, statstcal language model s appled to calculate the document relevance and measure the novelty of a document. However, as the subtopc retreval method s concerned mostly on coverng as many subtopcs of a query topc as possble, t may not acheve the lowest redundancy of a group of documents. As reported n [3], the majorty of people n the experments sad they preferred the method whch provdes them search results wth the most broad and nterestng topcs. However, snce the top search results are very often domnated by a set of closely related documents on some specfc topc, users often have to face the followng two stuatons: () the top search results can hardly cover a suffcent varety of topcs to meet the users dversfed nformaton need; (2) there s no ndcaton about how nformatve a returned document s on the query topc. In tradtonal nformaton retreval research, precson and recall [] have been used as metrcs to evaluate nformaton retreval systems. Both metrcs only concern about the relevance of the documents returned, wthout concernng the number of varous topcs that the returned document lst covers, or the range of topcs a sngle returned document covers. In web lnk analyss research, the popularty of a web page [9, 2] has been wdely adopted to measure the qualty of a web page. However, ths knd of qualty s computed based on web page lnk graph and s ndependent to the content of a web page. All these observatons motvate us to ntroduce two novel metrcs, dversty and nformaton rchness, whch measure the qualty of

2 search results by consderng the content based lnk structure of a group of documents and the content of a sngle document n the search results. Dversty measures the varety of topcs n a group of documents. It shows the holstc property of documents set. Informaton rchness measures how many dfferent topcs a sngle document contans. Based on the two metrcs, a novel algorthm named Affnty Rankng (AR) s proposed to re-rank the top search results. In partcular, we frst model the content based lnk structure of a group of documents as a drected graph whch we call an Affnty Graph (AG) based on the asymmetrc smlartes between document pars. Smlar to web page lnk analyss, an mportance score s computed based on Affnty Graph for each document ndcatng ts nformaton rchness. Secondly, we apply a greedy algorthm to assgn a penalty score to each returned document consderng the dversty property of query-related topcs. Thrdly, the AR score of each document s obtaned as a combnaton of the nformaton rchness and dversty penalty scores. AR scores are then used to re-rank the top search results. Our expermental results n Yahoo! Drectory and ODP Dataset demonstrate that our proposed AR algorthm sgnfcantly mproves the coverage of query-related topcs n the top 0 search results over the K-Means clusterng algorthm. Meanwhle, experments on a newsgroup data set show that the AR algorthm acheves about 3% mprovement n dversty and 2% mprovement n nformaton rchness n the top 0 search results wthout loss n precson and recall. The rest of the paper s organzed as follows. In Secton 2, we ntroduce the background by explanng the state-of-art lnk analyss algorthms. In Secton 3, we ntroduce the Affnty Rankng algorthm, as well as the formal defntons of dversty and nformaton rchness. Experments and evaluatons are reported n Secton 4. We conclude and dscuss future works n Secton BACKGROUND Recently, there have been growng research nterests on mnng the relatonshp between data objects, whch s usually referred to as lnk n the lterature. Lnk structure has been proved to be very useful n varous applcatons such as nformaton retreval [9, 2], classfcaton [0] and clusterng [8]. Two of the most famous works on lnk analyss are Google s PageRank algorthm [2] and Klenberg s HITS algorthm [9]. Both of them make use of the hyperlnk structure among web pages to model a group of web pages as a lnk graph. Explct lnk analyss and mplct lnk analyss [4, 6, 7] are currently two major sub-areas n lnk analyss research feld. Hyperlnks embedded n web pages can be consdered as explct lnks snce they explctly provde a connecton from one page to another. Implct lnks refers to those lnkages nferred from users behavor, such as the user s access pattern on web pages. The dfference between them s that explct lnk represents web edtor s vew snce hyperlnks are edted by them, whle mplct lnks represent end-users vew. Two typcal examples of mplct lnk analyss are DrectHt [6] and Small Web Search [7], whch assumes that two web pages are mplctly lnked f they are vsted sequentally by the same end-user. DrectHt and Small Web Search can be consdered as modfed versons of HITS and PageRank algorthms appled on mplct lnk structure. However, the metrcs used to evaluate these methods dscussed are ntrnscally subjectve, and they can not quantfy the nformaton contaned n web pages objectvely. In ths work, we develop objectve metrcs to measure the amount of nformaton contaned n a sngle document and also the topc varety n a group of documents. 3. AFFINITY RANKING The framework of Affnty Rankng s llustrated n Fgure. It ncludes three steps: () Affnty Graph (AG) based on the content lnk structure s constructed for the entre documents collecton; Informaton rchness of each document s then calculated based on AG. (2) For a gven query, a result set of relevant documents are produced by the full-text search process. Based on AG and the nformaton rchness score, dversty penalty s mposed to each document n the result set. (3) The nformaton rchness and dversty penalty scores are combned to obtan the Affnty Rank score so as to re-rank the top returned document lst. Fgure : The Affnty Rankng (AR) Framework We now gve the formal defntons of nformaton rchness and dversty. Dversty: Gven a set of documents R = { d, d2, Ldm}, we use dversty Dv (R) to denote the number of dfferent topcs contaned n R. Informaton Rchness: Gven a document collecton D = { d n}, we use nformaton rchness InfoRch ( d ) (see Eq. ()) to denote the nformatve degree of the document d,.e. the rchness of nformaton contaned n the document d wth respect to the entre collecton D. Wthout loss of generalty, we let InfoRch ( ) [0,]. d For a set of documents Rl = { d, d2, Ldl} whch contan Dv (R) topcs (.e. dversty = Dv (R) ), ts average nformaton rchness can be calculated as: Dv( R l ) N k InfoRch ( Rl ) = InfoRch( dk ) () Dv( Rl ) N k = k = Where d k represents one of the N k documents assocated wth the k-th topc. In the rest of ths paper, we use average nformaton rchness to refer to the nformaton rchness of a set of documents. 3. Affnty Graph Constructon Let D = { d n} denote a document collecton. Accordng to vector space model [5], each document d can be represented

3 as a vector d r. The smlarty between a documents par of d and d j can be calculated as r r r r d d j sm( d, d j ) = cos( d, d j ) = r r (2) d d j For further measurement on the sgnfcance of the smlarty between each document par, we defne the affnty of d j to d as r r d d j aff ( d, d j ) = r (3) d What s worthy to be noted s that the affnty defned here s asymmetrc because aff ( d, d j ) aff ( d, d j ). If we consder documents as nodes, the document collecton can be modeled as a graph by generatng the lnk between documents usng the followng rule: Lnk generaton A drectonal lnk from d to d j ( j ) wth weght aff ( d, d j ) s constructed f aff ( d, d j ) afft (aff t s a threshold); otherwse no lnk s constructed (or the weght of the lnk s regarded as zero). Thus, each lnk n the graph has been assgned a weght ndcatng the smlarty relatonshp between the correspondng document par. Snce all lnks are constructed accordng to the affnty value between document pars, we call the graph as Affnty Graph. Usually, documents of the same topc are smlar to each other. Hence, n Affnty Graph, a group of heavly lnked documents naturally represents a topc group, documents connected by weak or no lnks belong to dfferent topcs. 3.2 Informaton Rchness Computaton After obtanng Affnty Graph, we apply a lnk analyss algorthm to compute the nformaton rchness for each node n AG. Smlar to PageRank [0], we proposed the followng algorthm. Frst, an adjacency matrx M s used to descrbe AG wth each entry correspondng to the weght of a lnk n the graph. M = ( M, j ) n n s defned as below: aff ( d, d j ), f aff ( d, d j ) afft M, j = (4) 0, otherwse Wthout loss of generalty, M s normalzed to make the sum of each row equal to. n n ~ M, j M, j, f M, j 0 M, j = j= j= (5) 0, otherwse ~ ~ The normalzed adjacency matrx M = ( M, j ) n n s used to compute the nformaton rchness score for each node. Our computaton s based on the followng two ntutons:. The more neghbors a document has, the more nformatve t s; 2. The more nformatve a document s neghbors are, the more nformatve t s. Thus, the score of document d can be deduced from those of all other document lnked to t and t can be formulated n a recursve form as follows: ~ InfoRch ( d ) = InfoRch( d j ) M j, (6) all j And n a matrx form: λ M ~ T = λ (7) T where λ = [ InfoRch( d )] n s the egenvector of M ~. Snce M ~ s normally a sparse matrx, all-zero rows could possbly appear,.e. some documents have no other documents wth sgnfcant affnty to them. To compute a meanngful egenvector, we ntroduce a dumpng factor c (smlar to the random jumpng factor n PageRank): ~ ( c) InfoRch( d ) = c InfoRch( d j ) M j, + (8) n all j And as a matrx form: ~ T ( c) r λ = cm λ + e (9) n Where e r s a unt vector wth all components equalng to. The dumpng factor c (0,) s set at 0.85 n our experments. The computaton of nformaton rchness can be explaned n a way smlar to the random surfer model, and we call t random nformaton flow model. Imagne the nformaton s flowng among the document nodes at each teraton and we assume t stops at document d at current teraton. Let A ( d ) = { d j j, aff ( d, d j ) > afft} be the set of documents whch d lnks. In the next teraton, the nformaton can choose where to flow accordng to the followng two rules:. Wth a probablty c (.e. the dump factor), the nformaton wll flow nto one of the document nodes n A ( d ), and the probablty of flowng nto the document d j s proportonal to aff ( d, d j ) ; 2. Wth a probablty of c the nformaton wll randomly flow nto any document n the collecton. Fgure 2: A smple example of Affnty Graph. Fgure 2 gves an llustraton of the random nformaton flow model. On the Affnty Graph, besde lnks constructed by the lnk generaton rule, we label an addtonal lnk by dotted lne

4 whch ndcates the possblty of random nformaton flow as descrbed n Rule 2. A Markov chan can be nduced from the above process, where the states are gven by the documents and the transton (or flow) ~ T ( c) matrx s gven by cm + U, n whch U = [ ] n n n n. The statonary probablty dstrbuton of each state s gven by the prncpal egenvector of the transton matrx, whch s equvalent to Equaton (9). 3.3 Dversty Penalty Computng nformaton rchness helps us choose more nformatve documents to be presented n top search results. However, n some cases two most nformatve documents could be very smlar (or n an extreme case they can be duplcates). To ncrease the coverage on the top search results, dfferent penalty s mposed to the nformaton rchness score of each document n terms of ts nfluences to the topc dversty. The dversty penalty s calculated by a greedy algorthm. At each teraton of the algorthm, penalty s mposed to documents topc by topc, and the Affnty Rankng score gets updated wth t. The Greedy Algorthm for Dversty Penalty d, and ntalze the value of each document s Affnty Rank score to ts nformaton rchness score,.e. AR = InfoRch( d ), =,2, Ln Step 0. Intalze the two sets Α = Φ, Β = { =,2, Ln} Step. Step 2. Sort the documents n Β by ther current Affnty Rank scores n descendng order. Suppose the document ranked hghest n Β s d. Move document d from Β to Α, and then mpose a penalty to the score of each document whch has a lnk to d as follows: For each document, j j j d j ~ AR = AR M InfoRch( d ) (0) j, Step 3. Re-sort the documents n Β by the updated rank scores n descendng order. Step 4. Go to Step 2 untl Β = Φ or the teraton reaches a predefned maxmum count. The crucal part of the above greedy algorthm s Step 2, whch embodes a basc dea of penalty -- decrease the Affnty Rankng scores of less nformatve documents by the part conveyed from the most nformatve one. The more a document s smlar to the most nformatve one, the more penaltes t receves and ts Affnty Rankng score s decreased. It ensures only the most nformatve one n each topc becomes dstnctve n the rankng process. 3.4 Re-rankng Method The re-rankng mechansm s a combnaton of results from fulltext search and Affnty Rankng. There are two schemes of combnaton: score-combnaton and rank-combnaton. A user query s denoted by q. A set of relevant documents by fulltext search s denoted by Θ. The score-combnaton scheme uses a lnear combnaton of two parts: one comes from the score of fulltext search, and the other from the Affnty Rankng score. However the two scores are always on dfferent order of magntudes and ther raw values vary n a dfferent range. Therefore, we perform dfferent normalzaton (average normalzaton and log average normalzaton) for the two scores, and then combne the two parts together: Sm( q, d ) log ARΘ Score( q, d ) = α + β, d Θ SmΘ( q) log AR () where α + β = and SmΘ ( q) = Max d Θ Sm( q, d ) (2) ARΘ = Max d AR Θ (3) The rank-combnaton scheme of re-rankng uses a lnear combnaton of the ranks based on full-text search and Affnty Rankng, shown as follow: Score( q, d ) = α RankSm( q, d + RankAR, d Θ ) β (4) The α and β n both two combnaton schemes are parameters whch can be tuned. When β = 0, no re-rankng s performed, and the search results are equvalent to full-text search; wth the ncrease of β, more weght s put on the Affnty Rankng n the re-rankng process; when α = 0 (and β = ), we totally rely on Affnty Rankng score to re-rank the search results. 4. EXPERIMENTS We conducted experments on Yahoo! Drectory, ODP Data and a Newsgroup data set to demonstrate the effectveness of our proposed Affnty Rankng scheme. 4. Data Yahoo! Drectory s one of the most famous Web drectores. We downloaded the drectory n June, It contaned a total of 292,26 categores (ncludng leaf categores and non-leaf categores). All categores are organzed nto a 6-level herarchy. Smlar to many prevous works [2, 7], we downloaded the ndex pages of the webstes lsted n Yahoo! drectory as the labeled documents. As a result, we have downloaded 792,60 documents n total. ODP (Open Drectory Project) s another famous Web drectory. It s probably the largest, most comprehensve human-edted drectory on the Web, whch s constructed and mantaned by a vast, global communty of volunteer edtors []. We downloaded the drectory n August, ODP ncludes a total of 72,565 categores. Smlar to the Yahoo! dataset, we downloaded the ndex pages of the webstes lsted n ODP as labeled documents. As a result, we have downloaded,547,000 documents n total. The Newsgroup data s composed of 256,449 posts collected from 7 commercal applcatons related newsgroups over a perod of 4 months wth a total sze of about 400M. A post parser s appled to remove the stop words and unrelated words such as from, to, tme, sgnature, and ctatons, et al. The ttle and content of the post are gven a 3: weghtng rato n ndexng process. Porter stemmng [3] s also performed over the entre dataset. For the Newsgroup dataset, there are two specfc consderatons. () There s no explct lnk exstng among the posts; (2) Newsgroup s a typcal collecton composed of documents wth repettve content because large amount of posts are very lkely to be devoted to the same topc. Tradtonal nformaton retreval whch purely reles on the full-text content wll result n more redundancy due to smlar posts n the top search results. Our

Drectory and ODP Data set to compare AR and the tradtonal clusterng method K-Means to see whch method can cover more query-related topcs n top 0 search results. We selected 20 queres from Yahoo!

5 proposed Affnty Rankng scheme can be used to solve ths problem. We used the Okap system as our baselne retreval system. For each query, Okap provdes a set of documents ranked by textbased smlarty score. 4.2 Affnty Rankng vs. K-Means Clusterng We conducted experments on Yahoo! Drectory and ODP Data set to compare AR and the tradtonal clusterng method K-Means to see whch method can cover more query-related topcs n top 0 search results. We selected 20 queres from Yahoo! Drectory category labels and ODP category labels, respectvely. Table and Table 2 gve the queres. Table : Queres used n Yahoo! Drectory No. Query Art Hstory 2 Art Artsts 3 Performng Arts Dance 4 Vsual Arts Thematc 5 Consultng Medcal 6 Scence Astronomy 7 Scence Physcs 8 Scence Alternatve 9 Scence Astronomy 0 Ecology Educaton 2 Mathematcs 3 Ethnc Studes 4 Poltcal Scence 5 Socal Scence Psychology 6 Women's Studes 7 Crme 8 Famles 9 Relatonshps 20 Sexualty Table 2: Queres used n ODP Data No. Query Internet Protocols 2 Home Cookng 3 Agrculture Hortculture 4 Scence Chemstry 5 Food Baked Goods 6 Food Meat 7 Food Produce 8 Musc Related Merchandse 9 Bagppe Bands 0 Consumer Goods Eyewear Dary 2 Insurance Carrers 3 Lterature Amercan Early 4 Mystery 5 Poetry Fxed Verse 6 Poetry Forms 7 CGI 8 Dseases Lver 9 Dogs Tranng 20 E-Books The top 000 search results of each query are passed to AR or K- Means algorthm to re-rank top 0 results. For K-Means algorthm, we set K=0 and use the top document of each cluster to construct the top 0 results. F value s used to measure the performance of Affnty Rankng and K-Means clusterng. The recall ( R ), precson ( P ), and F are defned as follows: Nlabel Nsys Nlabel Nsys 2RP R = P = F = N N R + P Nlabel label denotes the number of dfferent sub-category labels n Yahoo! Drectory or ODP. N sys denotes the correspondng subcategory label number n the top 0 search results re-ranked by AR or K-Means algorthm. Fgures 3 and 4 show that AR sgnfcantly mproves the coverage of topcs compared to K- Means method on both Yahoo! and ODP Data. Fgure 3: F Values of AR and K-Means n Yahoo! Drectory Fgure 4: F Values of AR and K-Means n ODP Data 4.3 Affnty Rankng n Newsgroup dataset 4.3. Evaluaton Metrcs and Ground Truth We used 20 queres lsted n Table 3 to retreve from the newsgroup dataset of documents, and apply the proposed AR scheme to re-rank top 50 documents returned from the baselne system (OKAPI) [4]. The queres vary from word to 3 words, coverng several commercal software products. sys

6 Table 3: Queres used n our experments No. Query No. Query Blue screen System requrement 2 Offce update 2 Access update 3 actvate product 3 Excel crash 4 Excel formula 4 Offce Offce assstant 5 Offce unnstall 6 outer jon 6 Outlook prnt error 7 Pe 7 pop3 server 8 prnt prevew 8 save attachment 9 SMTP 9 vrus scan 0 word font 20 Word prnt We compare our approach wth the Okap system n three aspects: dversty, nformaton rchness and relevance. The dversty for a document set and nformaton rchness for a sngle document have already been defned n Secton 3. Smlarly, the average relevance of a set of document Rl = { d, d2, Ldl} to a gven query q s defned as follows: l Rlv Rl, q) = Rlv( d, q) l = ( (5) where Rlv( d, q) [0,] s the relevance of document d to query q. Four researchers n web search and mnng area are hred to ndependently evaluate the expermental results. They labeled the top 50 search results for each of the 20 queres based on the followng steps:. Make an overvew of the 50 search results, and then manually cluster them nto arbtrary number of groups. Each group should have one common topc and there should be no sgnfcant overlap between the group topcs; 2. In each topc group, gve each document a score ndcatng ther nformaton rchness for that topc. The score ranges from 0 to 3 (3 - very nformatve, 2 - nformatve, - less nformatve, 0 - not nformatve); 3. Gve each document a score ndcatng ther relevance to the query (2 - relevant, - hard to tell, 0 rrelevant). Fnally, the scores n the step 2 and step 3 are normalzed nto [0, ] accordng to the defntons of nformaton rchness and relevance. The labeled data served as the ground truth to evaluate the dversty, nformaton rchness and relevance of the top N search results ( N 50 ). Snce the labeled ground truth (e.g. the number of topcs n the top 50 search results) vares from user to user, our mprovement measures are presented n the form of macro relatve change whch s defned as: N = macro = (6) N X A X F = (7) X F where N s the number of users,.e. 4, X could be dversty, nformaton rchness, or relevance of the top search results, the superscrpt denotes the -th user s ground truth, and the subscrpts A and F represent results from our rankng scheme and full-text search, respectvely Improvement n Top 0 Search Results As the top 0 search results always receve the most attenton of end-users, we also conduct experments to show how Affnty Rankng affects the top 0 search results from the newsgroup data set. Table 4 shows the relatve mprovement of AR re-rankng over Okap system. Table 4: Improvement n top 0 search results Informaton Dversty Relevance Rchness Relatve Change +3% +2% +0.72% p value at t-test In ths experment, we use the rank-combnaton scheme and whch α = and β = 0. From Table 4, we can see that our proposed Affnty Rankng acheves 3% and 2% mprovements n dversty and nformaton rchness compared over the full-text search system. T-test result ndcates that ths mprovement s statstcally sgnfcant. The experments results confrm that our proposed algorthm can mprove the dversty and nformaton rchness of the top 0 search results wthout loss n relevance Improvement wthn Top 50 Search Results We also measure the mprovements of AR wthn dfferent number of search results. Fgure 5 llustrates the relatve mprovement n dversty as the number of search results ncreases. It s shown that our method always mproves the dversty n the search results. Intally, the dversty mprovement ncreases sharply wth the N value and reaches a maxmum when N = 0, whch s usually the number of results fttng nto the frst search result page and browsed by most end-users. Then the dversty mprovement gradually falls down to zero * when N reaches 50. We can conclude from the fgure that the relatve order of results s changed so that documents from dfferent topcs are shfted forward to the top of the returned search lst; and consequently the topc dversty of the top returned results s mproved. Fgure 5: Dversty mprovement by Affnty Rank wthn top 50 search results * Snce re-rankng the top 50 results only changes ther order, the relatve change n dversty for all the 50 results s zero, c.f. the defnton of dversty. (The same for nformaton rchness n Fgure 4).

Fgure 6: Informaton Rchness mprovement by Affnty Rank wthn top 50 search results Fgure 6 llustrates the relatve mprovement n nformaton rchness as the number of top results ncreases.

Wth N ncreasng, the mprovement gradually gets less dstnct snce more overlappng between full-text search results and re-ranked results appears.

7 Fgure 6: Informaton Rchness mprovement by Affnty Rank wthn top 50 search results Fgure 6 llustrates the relatve mprovement n nformaton rchness as the number of top results ncreases. We found that an approxmate 0% mprovement can be acheved wthn the top 5 search results after re-rankng. Wth N ncreasng, the mprovement gradually gets less dstnct snce more overlappng between full-text search results and re-ranked results appears. We conclude from ths fgure that more nformatve documents had been promoted towards the top poston Improvement n Top 0 Search Results As mentoned n prevous secton, there are two rankng combnaton schemes to be used and a par of parameters to be tuned. The rato between the parameter par,.e. α : β, determnes the weght of the Affnty Rankng score versus the full-text search score. Takng the top 0 search results as an nstance, we gve a range of values for α : β and compare the relatve mprovement n dversty and nformaton rchness. We also compare the two rankng combnaton schemes, and the results are shown n Fgure 7 and Fgure 8, respectvely. Regardless of whch scheme s used, as long as β : α s bg enough (.e., puttng enough weght on Affnty Rankng), the mprovement n both dversty and nformaton rchness wll stay around the maxmum value wthout much change. What s more, the range for large value of α : β s qute sgnfcant. Although the optmum value of α : β s hard to formulate, the emprcal results show that f we smply re-rank totally by Affnty Rankng,.e. α = 0 and β = (shown as α : β = 0 n the fgures), the mprovement n both dversty and nformaton rchness s very close to the maxmum value we can acheve. Fgure 7: Parameter tunng for top 0 search results n the score-combnaton scheme Fgure 8: Parameter tunng for top 0 search results n the rank-combnaton scheme From the above two fgures, t s easy to see that the rankcombnaton scheme s slghtly better than score-combnaton when the rato of β : α s suffcently large. 4.4 A Case Study We provde a case study here to gve an llustraton on how our rankng method works. Ths example s extracted from our experments on the Newsgroup search for the query Outlook prnt error. In ths scenaro, a user has a prntng error whle usng Mcrosoft Outlook. He comes to the Newsgroup to ask for help. Qute naturally, he starts wth Outlook prnt error to search and hopes to fnd a soluton to the problem. Snce there are many possble reasons that can lead to an Outlook prnt error, t s hard for hm to fnd the rght posts answerng hs specfc error problem n a short tme. By usng full-text search, we can obtan an ntal rank, part of whch s shown n Table 5. The Affnty Rank score s gven for each lsted result wth ts topc ndcated by some abbrevatons. Snce those search results are all newsgroup posts, we also label ther threads wth Roman numbers. For convenence, we name the retreved post n the -th poston n the ntal rank as post p. Table 5: Search results for outlook prnt error Intal Affnty Rank New Topc Thread Rank Score Rank e-006 u. e. I e-006 u. e. II e-006 u. e. III e-006 u. e. I e-006 u. e. I e-006. a. IV e-006 u. e. I e-006 u. e. I e-006 u. e. I e-006 u. e. I e-006 n.. I e-006 u. e. V e-006 p. f. VI e-006 u. e. VII 7

8 In the top 50 retreved posts, there are roughly 6-8 reasons for the Outlook prnt error, such as:. Wth prompted error code of Unspecfed Error, abbrevated as u. e. n the table; 2. Wth prompted error code of nvald argument, abbrevated as. a. n the table; 3. Error caused by some functon not mplemented, abbrevated as n.. n the table; 4. A specal error occurred only when prnt mals n the publc folder n Outlook, abbrevated as p. f. n the table. Note that the topc of posts can not be judged smply by newsgroup threads. For nstances, n Table 5, p and p 2 come from dfferent threads but belong to the same topc, whle p 3 dscusses a new topc other than most other posts n ts thread. As can be seen from Table 5, the ntal top 0 retreved posts there only contan two topcs nvolved u. e. and. a., and the top 0 s domnated by posts dscussng the u. e. error. After rerankng, the topc number n top 0 results ncreases to four. Posts p 3 and p 24 are promoted to top 0 and brng two new topcs. Also, p 6 moves to the frst poston. Further analyss shows that p 6, p 3 and p 24 are the most nformatve posts descrbng the. a., n.. and p. f. problems, respectvely. The ranks of the three posts are promoted because they have relatvely large Affnty Rank scores (shown n Table 5). Ths case provdes a typcal example on how Affnty Rankng helps mprove the dversty and nformaton rchness n the top search results. 5. CONCLUSIONS Hgh-qualty search results depend on many factors. The wellrecognzed metrcs such as relevance and mportance do not necessarly guarantee the satsfacton from end-users. In ths paper, we proposed two new metrcs, dversty and nformaton rchness, to measure the search performance. Further, a novel rankng scheme, Affnty Rankng, s proposed to re-rank the search results to mprove the dversty and nformaton rchness of the top search results. Our experments showed that the proposed metrcs and new rankng method can effectvely mprove the search performance by presentng wder topc coverage and more hghly nformatve results n each topc n the top results. The mprovement s sgnfcant compared wth the tradtonal full-text search and brngs no loss to relevance. Our future work ncludes scalng our Affnty Rankng computaton, for example, to the Web scale. 6. REFERENCES [] Baeza-Yates, R. and Rbero-Neto, B. Modern Informaton Retreval. Addson Wesley Longman, 999. [2] Calvo, R.A., Lee, J.-M. and L, X. Managng Content wth Automatc Document Classfcaton. Journal of Dgtal Informaton, 5 (2). [3] Carbonell, J. and Goldsten, J., The use of MMR, dverstybased rerankng for reorderng documents and producng summares. In Proceedngs of the 2st annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Melbourne, Australa, 998), [4] Chen, Z., Tao, L., Wang, J., Lu, W. and Ma, W.-Y., A Unfed Framework for Web Lnk Analyss. In Proceedngs of the 3rd Internatonal Conference on Web Informaton Systems Engneerng, (Sngapore, 2002), [5] Croft, W.B., Cronen-Townsend, S. and Larvrenko, V., Relevance feedback and personalzaton: A language modelng perspectve. In Proceedngs of the DELOS Network of Excellence Workshop on "Personalsaton and Recommender Systems n Dgtal Lbrares", (Dubln Cty Unversty, Ireland, 200). [6] DrectHt. [7] Dumas, S. and Chen, H., Herarchcal classfcaton of Web content. In Proceedngs of the 23rd annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Athens, Greece, 2000), [8] Gbson, D., Klenberg, J.M. and Raghavan, P., Inferrng Web communtes from lnk topology. In Proceedngs of the 9th ACM Conference on Hypertext and Hypermeda, (Pttsburgh, PA, 998), [9] Klenberg, J.M. Authortatve sources n a hyperlnked envronment. Journal of the ACM (JACM), 46 (5) [0] Lu, Q. and Getoor, L., Lnk-based Classfcaton. In Proceedngs of the Internatonal Conference on Machne Learnng, (Washngton DC, 2003), [] ODP. [2] Page, L., Brn, S., Motwan, R. and Wndograd, T. The pagerank ctaton rankng: Brng order to the web, Stanford Dgtal Lbrary Technologes Project, 998. [3] Porter, M.F. An algorthm for suffx strppng Program, 980, [4] Robertson, S.E., Walker, S., Hancock-Beauleu, M., Gull, A. and Lau, M., Okap at TREC. In Proceedngs of the Text REtreval Conference, (992), [5] Wong, S.K.M. and Raghavan, V.V., Vector space model of nformaton retreval: a reevaluaton. In Proceedngs of the 7th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Cambrdge, England, 984), [6] X, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y. and Fox, E.A., Lnk fuson: a unfed lnk analyss framework for mult-type nterrelated data objects. In Proceedngs of the 3th nternatonal conference on World Wde Web, (New York, NY, USA, 2004), [7] Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Zhang, H.-J. and Lu, C.-J., Implct lnk analyss for small web search. In Proceedngs of the 26th annual nternatonal ACM SIGIR conference on Research and Development n Informaton Retreval, (Toronto, Canada, 2003), [8] Zha, C.X., Cohen, W.W. and Lafferty, J., Beyond ndependent relevance: methods and evaluaton metrcs for subtopc retreval. In Proceedngs of the 26th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Toronto, Canada, 2003), 0-7.

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence