Improving Web Search Results Using Affinity Graph

Size: px
Start display at page:

Download "Improving Web Search Results Using Affinity Graph"

Transcription

1 Improvng Web Search Results Usng Affnty Graph Benyu Zhang, Hua L 2, Y Lu 3, Le J 4, Wens X 5, Weguo Fan 5, Zheng Chen, We-Yng Ma Mcrosoft Research Asa, 49 Zhchun Road, Bejng, 00080, P. R. Chna {byzhang, zhengc, wyma}@mcrosoft.com 2 LMAM, School of Mathematcal Scence, Pekng Unversty, Bejng, 0087, P. R. Chna lhua@math.pku.edu.cn 3 Department of Computer Scence and Engneerng, Mchgan State Unversty, East Lansng, MI 48824, USA luy3@cse.msu.edu 4 Department of Computer Scence, Bejng Insttute of Technology, Bejng, 0087, P. R. Chna jle03@bt.edu.cn 5 Vrgna Polytechnc Insttute and State Unversty, Blacksburg, VA 24060, USA {xwens, wfan}@vt.edu ABSTRACT In ths paper, we propose a novel rankng scheme named Affnty Rankng (AR) to re-rank search results by optmzng two metrcs: () dversty -- whch ndcates the varance of topcs n a group of documents; (2) nformaton rchness -- whch measures the coverage of a sngle document to ts topc. Both of the two metrcs are calculated from a drected lnk graph named Affnty Graph (AG). AG models the structure of a group of documents based on the asymmetrc content smlartes between each par of documents. Expermental results n Yahoo! Drectory, ODP Data, and Newsgroup data demonstrate that our proposed rankng algorthm sgnfcantly mproves the search performance. Specfcally, the algorthm acheves 3% mprovement n dversty and 2% mprovement n nformaton rchness relatvely wthn the top 0 search results. Categores and Subject Descrptors H.3.3 [Informaton Storage and Retreval]: Informaton Search and Retreval retreval models, search process; H.2.8 [Database Management]: Database Applcatons Data Mnng General Terms: Algorthms, Performance Keywords: Affnty Rankng, Informaton Retreval, Lnk Analyss, Dversty, and Informaton Rchness. INTRODUCTION Most current web search engnes tend to provde a lst of search results to users queres accordng to the relevance score of each document to the query. Ths paradgm s very useful when users nformaton needs (represented by the queres) are clear and they care more about precson than recall n the returned results. Unfortunately, many of the queres presented to a web search engne nowadays are ambguous [5] and the user s actual Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR 05, August 5 9, 2005, Salvador, Brazl. Copyrght 2005 ACM /05/ $5.00. nformaton needs are unknown. Users may suffer from the vast number of redundant and yet not very relevant documents that are related to a few most popular topcs lsted n the top of search results. Such search experence often makes users frustrated. Several approaches have been proposed to mprove such stuaton. Carbonell et al [3] proposed a re-rankng method based on maxmal margnal relevance crteron to reduce redundancy whle mantanng query relevance n re-ranked documents. A margnal relevance of a document s defned as the relevance wth a query mnus that of prevously selected documents. Maxmzng ths margnal relevance wll help acheve a low redundancy n a group of documents. But there s no drect crteron about dversty evaluaton to ensure that the group of documents wth low redundancy can acheve large topc coverage. Recently proposed subtopc retreval method [8] s another useful approach to mprove the hgh redundancy search result. Dfferent from Carbonell s work, statstcal language model s appled to calculate the document relevance and measure the novelty of a document. However, as the subtopc retreval method s concerned mostly on coverng as many subtopcs of a query topc as possble, t may not acheve the lowest redundancy of a group of documents. As reported n [3], the majorty of people n the experments sad they preferred the method whch provdes them search results wth the most broad and nterestng topcs. However, snce the top search results are very often domnated by a set of closely related documents on some specfc topc, users often have to face the followng two stuatons: () the top search results can hardly cover a suffcent varety of topcs to meet the users dversfed nformaton need; (2) there s no ndcaton about how nformatve a returned document s on the query topc. In tradtonal nformaton retreval research, precson and recall [] have been used as metrcs to evaluate nformaton retreval systems. Both metrcs only concern about the relevance of the documents returned, wthout concernng the number of varous topcs that the returned document lst covers, or the range of topcs a sngle returned document covers. In web lnk analyss research, the popularty of a web page [9, 2] has been wdely adopted to measure the qualty of a web page. However, ths knd of qualty s computed based on web page lnk graph and s ndependent to the content of a web page. All these observatons motvate us to ntroduce two novel metrcs, dversty and nformaton rchness, whch measure the qualty of

2 search results by consderng the content based lnk structure of a group of documents and the content of a sngle document n the search results. Dversty measures the varety of topcs n a group of documents. It shows the holstc property of documents set. Informaton rchness measures how many dfferent topcs a sngle document contans. Based on the two metrcs, a novel algorthm named Affnty Rankng (AR) s proposed to re-rank the top search results. In partcular, we frst model the content based lnk structure of a group of documents as a drected graph whch we call an Affnty Graph (AG) based on the asymmetrc smlartes between document pars. Smlar to web page lnk analyss, an mportance score s computed based on Affnty Graph for each document ndcatng ts nformaton rchness. Secondly, we apply a greedy algorthm to assgn a penalty score to each returned document consderng the dversty property of query-related topcs. Thrdly, the AR score of each document s obtaned as a combnaton of the nformaton rchness and dversty penalty scores. AR scores are then used to re-rank the top search results. Our expermental results n Yahoo! Drectory and ODP Dataset demonstrate that our proposed AR algorthm sgnfcantly mproves the coverage of query-related topcs n the top 0 search results over the K-Means clusterng algorthm. Meanwhle, experments on a newsgroup data set show that the AR algorthm acheves about 3% mprovement n dversty and 2% mprovement n nformaton rchness n the top 0 search results wthout loss n precson and recall. The rest of the paper s organzed as follows. In Secton 2, we ntroduce the background by explanng the state-of-art lnk analyss algorthms. In Secton 3, we ntroduce the Affnty Rankng algorthm, as well as the formal defntons of dversty and nformaton rchness. Experments and evaluatons are reported n Secton 4. We conclude and dscuss future works n Secton BACKGROUND Recently, there have been growng research nterests on mnng the relatonshp between data objects, whch s usually referred to as lnk n the lterature. Lnk structure has been proved to be very useful n varous applcatons such as nformaton retreval [9, 2], classfcaton [0] and clusterng [8]. Two of the most famous works on lnk analyss are Google s PageRank algorthm [2] and Klenberg s HITS algorthm [9]. Both of them make use of the hyperlnk structure among web pages to model a group of web pages as a lnk graph. Explct lnk analyss and mplct lnk analyss [4, 6, 7] are currently two major sub-areas n lnk analyss research feld. Hyperlnks embedded n web pages can be consdered as explct lnks snce they explctly provde a connecton from one page to another. Implct lnks refers to those lnkages nferred from users behavor, such as the user s access pattern on web pages. The dfference between them s that explct lnk represents web edtor s vew snce hyperlnks are edted by them, whle mplct lnks represent end-users vew. Two typcal examples of mplct lnk analyss are DrectHt [6] and Small Web Search [7], whch assumes that two web pages are mplctly lnked f they are vsted sequentally by the same end-user. DrectHt and Small Web Search can be consdered as modfed versons of HITS and PageRank algorthms appled on mplct lnk structure. However, the metrcs used to evaluate these methods dscussed are ntrnscally subjectve, and they can not quantfy the nformaton contaned n web pages objectvely. In ths work, we develop objectve metrcs to measure the amount of nformaton contaned n a sngle document and also the topc varety n a group of documents. 3. AFFINITY RANKING The framework of Affnty Rankng s llustrated n Fgure. It ncludes three steps: () Affnty Graph (AG) based on the content lnk structure s constructed for the entre documents collecton; Informaton rchness of each document s then calculated based on AG. (2) For a gven query, a result set of relevant documents are produced by the full-text search process. Based on AG and the nformaton rchness score, dversty penalty s mposed to each document n the result set. (3) The nformaton rchness and dversty penalty scores are combned to obtan the Affnty Rank score so as to re-rank the top returned document lst. Fgure : The Affnty Rankng (AR) Framework We now gve the formal defntons of nformaton rchness and dversty. Dversty: Gven a set of documents R = { d, d2, Ldm}, we use dversty Dv (R) to denote the number of dfferent topcs contaned n R. Informaton Rchness: Gven a document collecton D = { d n}, we use nformaton rchness InfoRch ( d ) (see Eq. ()) to denote the nformatve degree of the document d,.e. the rchness of nformaton contaned n the document d wth respect to the entre collecton D. Wthout loss of generalty, we let InfoRch ( ) [0,]. d For a set of documents Rl = { d, d2, Ldl} whch contan Dv (R) topcs (.e. dversty = Dv (R) ), ts average nformaton rchness can be calculated as: Dv( R l ) N k InfoRch ( Rl ) = InfoRch( dk ) () Dv( Rl ) N k = k = Where d k represents one of the N k documents assocated wth the k-th topc. In the rest of ths paper, we use average nformaton rchness to refer to the nformaton rchness of a set of documents. 3. Affnty Graph Constructon Let D = { d n} denote a document collecton. Accordng to vector space model [5], each document d can be represented

3 as a vector d r. The smlarty between a documents par of d and d j can be calculated as r r r r d d j sm( d, d j ) = cos( d, d j ) = r r (2) d d j For further measurement on the sgnfcance of the smlarty between each document par, we defne the affnty of d j to d as r r d d j aff ( d, d j ) = r (3) d What s worthy to be noted s that the affnty defned here s asymmetrc because aff ( d, d j ) aff ( d, d j ). If we consder documents as nodes, the document collecton can be modeled as a graph by generatng the lnk between documents usng the followng rule: Lnk generaton A drectonal lnk from d to d j ( j ) wth weght aff ( d, d j ) s constructed f aff ( d, d j ) afft (aff t s a threshold); otherwse no lnk s constructed (or the weght of the lnk s regarded as zero). Thus, each lnk n the graph has been assgned a weght ndcatng the smlarty relatonshp between the correspondng document par. Snce all lnks are constructed accordng to the affnty value between document pars, we call the graph as Affnty Graph. Usually, documents of the same topc are smlar to each other. Hence, n Affnty Graph, a group of heavly lnked documents naturally represents a topc group, documents connected by weak or no lnks belong to dfferent topcs. 3.2 Informaton Rchness Computaton After obtanng Affnty Graph, we apply a lnk analyss algorthm to compute the nformaton rchness for each node n AG. Smlar to PageRank [0], we proposed the followng algorthm. Frst, an adjacency matrx M s used to descrbe AG wth each entry correspondng to the weght of a lnk n the graph. M = ( M, j ) n n s defned as below: aff ( d, d j ), f aff ( d, d j ) afft M, j = (4) 0, otherwse Wthout loss of generalty, M s normalzed to make the sum of each row equal to. n n ~ M, j M, j, f M, j 0 M, j = j= j= (5) 0, otherwse ~ ~ The normalzed adjacency matrx M = ( M, j ) n n s used to compute the nformaton rchness score for each node. Our computaton s based on the followng two ntutons:. The more neghbors a document has, the more nformatve t s; 2. The more nformatve a document s neghbors are, the more nformatve t s. Thus, the score of document d can be deduced from those of all other document lnked to t and t can be formulated n a recursve form as follows: ~ InfoRch ( d ) = InfoRch( d j ) M j, (6) all j And n a matrx form: λ M ~ T = λ (7) T where λ = [ InfoRch( d )] n s the egenvector of M ~. Snce M ~ s normally a sparse matrx, all-zero rows could possbly appear,.e. some documents have no other documents wth sgnfcant affnty to them. To compute a meanngful egenvector, we ntroduce a dumpng factor c (smlar to the random jumpng factor n PageRank): ~ ( c) InfoRch( d ) = c InfoRch( d j ) M j, + (8) n all j And as a matrx form: ~ T ( c) r λ = cm λ + e (9) n Where e r s a unt vector wth all components equalng to. The dumpng factor c (0,) s set at 0.85 n our experments. The computaton of nformaton rchness can be explaned n a way smlar to the random surfer model, and we call t random nformaton flow model. Imagne the nformaton s flowng among the document nodes at each teraton and we assume t stops at document d at current teraton. Let A ( d ) = { d j j, aff ( d, d j ) > afft} be the set of documents whch d lnks. In the next teraton, the nformaton can choose where to flow accordng to the followng two rules:. Wth a probablty c (.e. the dump factor), the nformaton wll flow nto one of the document nodes n A ( d ), and the probablty of flowng nto the document d j s proportonal to aff ( d, d j ) ; 2. Wth a probablty of c the nformaton wll randomly flow nto any document n the collecton. Fgure 2: A smple example of Affnty Graph. Fgure 2 gves an llustraton of the random nformaton flow model. On the Affnty Graph, besde lnks constructed by the lnk generaton rule, we label an addtonal lnk by dotted lne

4 whch ndcates the possblty of random nformaton flow as descrbed n Rule 2. A Markov chan can be nduced from the above process, where the states are gven by the documents and the transton (or flow) ~ T ( c) matrx s gven by cm + U, n whch U = [ ] n n n n. The statonary probablty dstrbuton of each state s gven by the prncpal egenvector of the transton matrx, whch s equvalent to Equaton (9). 3.3 Dversty Penalty Computng nformaton rchness helps us choose more nformatve documents to be presented n top search results. However, n some cases two most nformatve documents could be very smlar (or n an extreme case they can be duplcates). To ncrease the coverage on the top search results, dfferent penalty s mposed to the nformaton rchness score of each document n terms of ts nfluences to the topc dversty. The dversty penalty s calculated by a greedy algorthm. At each teraton of the algorthm, penalty s mposed to documents topc by topc, and the Affnty Rankng score gets updated wth t. The Greedy Algorthm for Dversty Penalty d, and ntalze the value of each document s Affnty Rank score to ts nformaton rchness score,.e. AR = InfoRch( d ), =,2, Ln Step 0. Intalze the two sets Α = Φ, Β = { =,2, Ln} Step. Step 2. Sort the documents n Β by ther current Affnty Rank scores n descendng order. Suppose the document ranked hghest n Β s d. Move document d from Β to Α, and then mpose a penalty to the score of each document whch has a lnk to d as follows: For each document, j j j d j ~ AR = AR M InfoRch( d ) (0) j, Step 3. Re-sort the documents n Β by the updated rank scores n descendng order. Step 4. Go to Step 2 untl Β = Φ or the teraton reaches a predefned maxmum count. The crucal part of the above greedy algorthm s Step 2, whch embodes a basc dea of penalty -- decrease the Affnty Rankng scores of less nformatve documents by the part conveyed from the most nformatve one. The more a document s smlar to the most nformatve one, the more penaltes t receves and ts Affnty Rankng score s decreased. It ensures only the most nformatve one n each topc becomes dstnctve n the rankng process. 3.4 Re-rankng Method The re-rankng mechansm s a combnaton of results from fulltext search and Affnty Rankng. There are two schemes of combnaton: score-combnaton and rank-combnaton. A user query s denoted by q. A set of relevant documents by fulltext search s denoted by Θ. The score-combnaton scheme uses a lnear combnaton of two parts: one comes from the score of fulltext search, and the other from the Affnty Rankng score. However the two scores are always on dfferent order of magntudes and ther raw values vary n a dfferent range. Therefore, we perform dfferent normalzaton (average normalzaton and log average normalzaton) for the two scores, and then combne the two parts together: Sm( q, d ) log ARΘ Score( q, d ) = α + β, d Θ SmΘ( q) log AR () where α + β = and SmΘ ( q) = Max d Θ Sm( q, d ) (2) ARΘ = Max d AR Θ (3) The rank-combnaton scheme of re-rankng uses a lnear combnaton of the ranks based on full-text search and Affnty Rankng, shown as follow: Score( q, d ) = α RankSm( q, d + RankAR, d Θ ) β (4) The α and β n both two combnaton schemes are parameters whch can be tuned. When β = 0, no re-rankng s performed, and the search results are equvalent to full-text search; wth the ncrease of β, more weght s put on the Affnty Rankng n the re-rankng process; when α = 0 (and β = ), we totally rely on Affnty Rankng score to re-rank the search results. 4. EXPERIMENTS We conducted experments on Yahoo! Drectory, ODP Data and a Newsgroup data set to demonstrate the effectveness of our proposed Affnty Rankng scheme. 4. Data Yahoo! Drectory s one of the most famous Web drectores. We downloaded the drectory n June, It contaned a total of 292,26 categores (ncludng leaf categores and non-leaf categores). All categores are organzed nto a 6-level herarchy. Smlar to many prevous works [2, 7], we downloaded the ndex pages of the webstes lsted n Yahoo! drectory as the labeled documents. As a result, we have downloaded 792,60 documents n total. ODP (Open Drectory Project) s another famous Web drectory. It s probably the largest, most comprehensve human-edted drectory on the Web, whch s constructed and mantaned by a vast, global communty of volunteer edtors []. We downloaded the drectory n August, ODP ncludes a total of 72,565 categores. Smlar to the Yahoo! dataset, we downloaded the ndex pages of the webstes lsted n ODP as labeled documents. As a result, we have downloaded,547,000 documents n total. The Newsgroup data s composed of 256,449 posts collected from 7 commercal applcatons related newsgroups over a perod of 4 months wth a total sze of about 400M. A post parser s appled to remove the stop words and unrelated words such as from, to, tme, sgnature, and ctatons, et al. The ttle and content of the post are gven a 3: weghtng rato n ndexng process. Porter stemmng [3] s also performed over the entre dataset. For the Newsgroup dataset, there are two specfc consderatons. () There s no explct lnk exstng among the posts; (2) Newsgroup s a typcal collecton composed of documents wth repettve content because large amount of posts are very lkely to be devoted to the same topc. Tradtonal nformaton retreval whch purely reles on the full-text content wll result n more redundancy due to smlar posts n the top search results. Our

5 proposed Affnty Rankng scheme can be used to solve ths problem. We used the Okap system as our baselne retreval system. For each query, Okap provdes a set of documents ranked by textbased smlarty score. 4.2 Affnty Rankng vs. K-Means Clusterng We conducted experments on Yahoo! Drectory and ODP Data set to compare AR and the tradtonal clusterng method K-Means to see whch method can cover more query-related topcs n top 0 search results. We selected 20 queres from Yahoo! Drectory category labels and ODP category labels, respectvely. Table and Table 2 gve the queres. Table : Queres used n Yahoo! Drectory No. Query Art Hstory 2 Art Artsts 3 Performng Arts Dance 4 Vsual Arts Thematc 5 Consultng Medcal 6 Scence Astronomy 7 Scence Physcs 8 Scence Alternatve 9 Scence Astronomy 0 Ecology Educaton 2 Mathematcs 3 Ethnc Studes 4 Poltcal Scence 5 Socal Scence Psychology 6 Women's Studes 7 Crme 8 Famles 9 Relatonshps 20 Sexualty Table 2: Queres used n ODP Data No. Query Internet Protocols 2 Home Cookng 3 Agrculture Hortculture 4 Scence Chemstry 5 Food Baked Goods 6 Food Meat 7 Food Produce 8 Musc Related Merchandse 9 Bagppe Bands 0 Consumer Goods Eyewear Dary 2 Insurance Carrers 3 Lterature Amercan Early 4 Mystery 5 Poetry Fxed Verse 6 Poetry Forms 7 CGI 8 Dseases Lver 9 Dogs Tranng 20 E-Books The top 000 search results of each query are passed to AR or K- Means algorthm to re-rank top 0 results. For K-Means algorthm, we set K=0 and use the top document of each cluster to construct the top 0 results. F value s used to measure the performance of Affnty Rankng and K-Means clusterng. The recall ( R ), precson ( P ), and F are defned as follows: Nlabel Nsys Nlabel Nsys 2RP R = P = F = N N R + P Nlabel label denotes the number of dfferent sub-category labels n Yahoo! Drectory or ODP. N sys denotes the correspondng subcategory label number n the top 0 search results re-ranked by AR or K-Means algorthm. Fgures 3 and 4 show that AR sgnfcantly mproves the coverage of topcs compared to K- Means method on both Yahoo! and ODP Data. Fgure 3: F Values of AR and K-Means n Yahoo! Drectory Fgure 4: F Values of AR and K-Means n ODP Data 4.3 Affnty Rankng n Newsgroup dataset 4.3. Evaluaton Metrcs and Ground Truth We used 20 queres lsted n Table 3 to retreve from the newsgroup dataset of documents, and apply the proposed AR scheme to re-rank top 50 documents returned from the baselne system (OKAPI) [4]. The queres vary from word to 3 words, coverng several commercal software products. sys

6 Table 3: Queres used n our experments No. Query No. Query Blue screen System requrement 2 Offce update 2 Access update 3 actvate product 3 Excel crash 4 Excel formula 4 Offce Offce assstant 5 Offce unnstall 6 outer jon 6 Outlook prnt error 7 Pe 7 pop3 server 8 prnt prevew 8 save attachment 9 SMTP 9 vrus scan 0 word font 20 Word prnt We compare our approach wth the Okap system n three aspects: dversty, nformaton rchness and relevance. The dversty for a document set and nformaton rchness for a sngle document have already been defned n Secton 3. Smlarly, the average relevance of a set of document Rl = { d, d2, Ldl} to a gven query q s defned as follows: l Rlv Rl, q) = Rlv( d, q) l = ( (5) where Rlv( d, q) [0,] s the relevance of document d to query q. Four researchers n web search and mnng area are hred to ndependently evaluate the expermental results. They labeled the top 50 search results for each of the 20 queres based on the followng steps:. Make an overvew of the 50 search results, and then manually cluster them nto arbtrary number of groups. Each group should have one common topc and there should be no sgnfcant overlap between the group topcs; 2. In each topc group, gve each document a score ndcatng ther nformaton rchness for that topc. The score ranges from 0 to 3 (3 - very nformatve, 2 - nformatve, - less nformatve, 0 - not nformatve); 3. Gve each document a score ndcatng ther relevance to the query (2 - relevant, - hard to tell, 0 rrelevant). Fnally, the scores n the step 2 and step 3 are normalzed nto [0, ] accordng to the defntons of nformaton rchness and relevance. The labeled data served as the ground truth to evaluate the dversty, nformaton rchness and relevance of the top N search results ( N 50 ). Snce the labeled ground truth (e.g. the number of topcs n the top 50 search results) vares from user to user, our mprovement measures are presented n the form of macro relatve change whch s defned as: N = macro = (6) N X A X F = (7) X F where N s the number of users,.e. 4, X could be dversty, nformaton rchness, or relevance of the top search results, the superscrpt denotes the -th user s ground truth, and the subscrpts A and F represent results from our rankng scheme and full-text search, respectvely Improvement n Top 0 Search Results As the top 0 search results always receve the most attenton of end-users, we also conduct experments to show how Affnty Rankng affects the top 0 search results from the newsgroup data set. Table 4 shows the relatve mprovement of AR re-rankng over Okap system. Table 4: Improvement n top 0 search results Informaton Dversty Relevance Rchness Relatve Change +3% +2% +0.72% p value at t-test In ths experment, we use the rank-combnaton scheme and whch α = and β = 0. From Table 4, we can see that our proposed Affnty Rankng acheves 3% and 2% mprovements n dversty and nformaton rchness compared over the full-text search system. T-test result ndcates that ths mprovement s statstcally sgnfcant. The experments results confrm that our proposed algorthm can mprove the dversty and nformaton rchness of the top 0 search results wthout loss n relevance Improvement wthn Top 50 Search Results We also measure the mprovements of AR wthn dfferent number of search results. Fgure 5 llustrates the relatve mprovement n dversty as the number of search results ncreases. It s shown that our method always mproves the dversty n the search results. Intally, the dversty mprovement ncreases sharply wth the N value and reaches a maxmum when N = 0, whch s usually the number of results fttng nto the frst search result page and browsed by most end-users. Then the dversty mprovement gradually falls down to zero * when N reaches 50. We can conclude from the fgure that the relatve order of results s changed so that documents from dfferent topcs are shfted forward to the top of the returned search lst; and consequently the topc dversty of the top returned results s mproved. Fgure 5: Dversty mprovement by Affnty Rank wthn top 50 search results * Snce re-rankng the top 50 results only changes ther order, the relatve change n dversty for all the 50 results s zero, c.f. the defnton of dversty. (The same for nformaton rchness n Fgure 4).

7 Fgure 6: Informaton Rchness mprovement by Affnty Rank wthn top 50 search results Fgure 6 llustrates the relatve mprovement n nformaton rchness as the number of top results ncreases. We found that an approxmate 0% mprovement can be acheved wthn the top 5 search results after re-rankng. Wth N ncreasng, the mprovement gradually gets less dstnct snce more overlappng between full-text search results and re-ranked results appears. We conclude from ths fgure that more nformatve documents had been promoted towards the top poston Improvement n Top 0 Search Results As mentoned n prevous secton, there are two rankng combnaton schemes to be used and a par of parameters to be tuned. The rato between the parameter par,.e. α : β, determnes the weght of the Affnty Rankng score versus the full-text search score. Takng the top 0 search results as an nstance, we gve a range of values for α : β and compare the relatve mprovement n dversty and nformaton rchness. We also compare the two rankng combnaton schemes, and the results are shown n Fgure 7 and Fgure 8, respectvely. Regardless of whch scheme s used, as long as β : α s bg enough (.e., puttng enough weght on Affnty Rankng), the mprovement n both dversty and nformaton rchness wll stay around the maxmum value wthout much change. What s more, the range for large value of α : β s qute sgnfcant. Although the optmum value of α : β s hard to formulate, the emprcal results show that f we smply re-rank totally by Affnty Rankng,.e. α = 0 and β = (shown as α : β = 0 n the fgures), the mprovement n both dversty and nformaton rchness s very close to the maxmum value we can acheve. Fgure 7: Parameter tunng for top 0 search results n the score-combnaton scheme Fgure 8: Parameter tunng for top 0 search results n the rank-combnaton scheme From the above two fgures, t s easy to see that the rankcombnaton scheme s slghtly better than score-combnaton when the rato of β : α s suffcently large. 4.4 A Case Study We provde a case study here to gve an llustraton on how our rankng method works. Ths example s extracted from our experments on the Newsgroup search for the query Outlook prnt error. In ths scenaro, a user has a prntng error whle usng Mcrosoft Outlook. He comes to the Newsgroup to ask for help. Qute naturally, he starts wth Outlook prnt error to search and hopes to fnd a soluton to the problem. Snce there are many possble reasons that can lead to an Outlook prnt error, t s hard for hm to fnd the rght posts answerng hs specfc error problem n a short tme. By usng full-text search, we can obtan an ntal rank, part of whch s shown n Table 5. The Affnty Rank score s gven for each lsted result wth ts topc ndcated by some abbrevatons. Snce those search results are all newsgroup posts, we also label ther threads wth Roman numbers. For convenence, we name the retreved post n the -th poston n the ntal rank as post p. Table 5: Search results for outlook prnt error Intal Affnty Rank New Topc Thread Rank Score Rank e-006 u. e. I e-006 u. e. II e-006 u. e. III e-006 u. e. I e-006 u. e. I e-006. a. IV e-006 u. e. I e-006 u. e. I e-006 u. e. I e-006 u. e. I e-006 n.. I e-006 u. e. V e-006 p. f. VI e-006 u. e. VII 7

8 In the top 50 retreved posts, there are roughly 6-8 reasons for the Outlook prnt error, such as:. Wth prompted error code of Unspecfed Error, abbrevated as u. e. n the table; 2. Wth prompted error code of nvald argument, abbrevated as. a. n the table; 3. Error caused by some functon not mplemented, abbrevated as n.. n the table; 4. A specal error occurred only when prnt mals n the publc folder n Outlook, abbrevated as p. f. n the table. Note that the topc of posts can not be judged smply by newsgroup threads. For nstances, n Table 5, p and p 2 come from dfferent threads but belong to the same topc, whle p 3 dscusses a new topc other than most other posts n ts thread. As can be seen from Table 5, the ntal top 0 retreved posts there only contan two topcs nvolved u. e. and. a., and the top 0 s domnated by posts dscussng the u. e. error. After rerankng, the topc number n top 0 results ncreases to four. Posts p 3 and p 24 are promoted to top 0 and brng two new topcs. Also, p 6 moves to the frst poston. Further analyss shows that p 6, p 3 and p 24 are the most nformatve posts descrbng the. a., n.. and p. f. problems, respectvely. The ranks of the three posts are promoted because they have relatvely large Affnty Rank scores (shown n Table 5). Ths case provdes a typcal example on how Affnty Rankng helps mprove the dversty and nformaton rchness n the top search results. 5. CONCLUSIONS Hgh-qualty search results depend on many factors. The wellrecognzed metrcs such as relevance and mportance do not necessarly guarantee the satsfacton from end-users. In ths paper, we proposed two new metrcs, dversty and nformaton rchness, to measure the search performance. Further, a novel rankng scheme, Affnty Rankng, s proposed to re-rank the search results to mprove the dversty and nformaton rchness of the top search results. Our experments showed that the proposed metrcs and new rankng method can effectvely mprove the search performance by presentng wder topc coverage and more hghly nformatve results n each topc n the top results. The mprovement s sgnfcant compared wth the tradtonal full-text search and brngs no loss to relevance. Our future work ncludes scalng our Affnty Rankng computaton, for example, to the Web scale. 6. REFERENCES [] Baeza-Yates, R. and Rbero-Neto, B. Modern Informaton Retreval. Addson Wesley Longman, 999. [2] Calvo, R.A., Lee, J.-M. and L, X. Managng Content wth Automatc Document Classfcaton. Journal of Dgtal Informaton, 5 (2). [3] Carbonell, J. and Goldsten, J., The use of MMR, dverstybased rerankng for reorderng documents and producng summares. In Proceedngs of the 2st annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Melbourne, Australa, 998), [4] Chen, Z., Tao, L., Wang, J., Lu, W. and Ma, W.-Y., A Unfed Framework for Web Lnk Analyss. In Proceedngs of the 3rd Internatonal Conference on Web Informaton Systems Engneerng, (Sngapore, 2002), [5] Croft, W.B., Cronen-Townsend, S. and Larvrenko, V., Relevance feedback and personalzaton: A language modelng perspectve. In Proceedngs of the DELOS Network of Excellence Workshop on "Personalsaton and Recommender Systems n Dgtal Lbrares", (Dubln Cty Unversty, Ireland, 200). [6] DrectHt. [7] Dumas, S. and Chen, H., Herarchcal classfcaton of Web content. In Proceedngs of the 23rd annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Athens, Greece, 2000), [8] Gbson, D., Klenberg, J.M. and Raghavan, P., Inferrng Web communtes from lnk topology. In Proceedngs of the 9th ACM Conference on Hypertext and Hypermeda, (Pttsburgh, PA, 998), [9] Klenberg, J.M. Authortatve sources n a hyperlnked envronment. Journal of the ACM (JACM), 46 (5) [0] Lu, Q. and Getoor, L., Lnk-based Classfcaton. In Proceedngs of the Internatonal Conference on Machne Learnng, (Washngton DC, 2003), [] ODP. [2] Page, L., Brn, S., Motwan, R. and Wndograd, T. The pagerank ctaton rankng: Brng order to the web, Stanford Dgtal Lbrary Technologes Project, 998. [3] Porter, M.F. An algorthm for suffx strppng Program, 980, [4] Robertson, S.E., Walker, S., Hancock-Beauleu, M., Gull, A. and Lau, M., Okap at TREC. In Proceedngs of the Text REtreval Conference, (992), [5] Wong, S.K.M. and Raghavan, V.V., Vector space model of nformaton retreval: a reevaluaton. In Proceedngs of the 7th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Cambrdge, England, 984), [6] X, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y. and Fox, E.A., Lnk fuson: a unfed lnk analyss framework for mult-type nterrelated data objects. In Proceedngs of the 3th nternatonal conference on World Wde Web, (New York, NY, USA, 2004), [7] Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Zhang, H.-J. and Lu, C.-J., Implct lnk analyss for small web search. In Proceedngs of the 26th annual nternatonal ACM SIGIR conference on Research and Development n Informaton Retreval, (Toronto, Canada, 2003), [8] Zha, C.X., Cohen, W.W. and Lafferty, J., Beyond ndependent relevance: methods and evaluaton metrcs for subtopc retreval. In Proceedngs of the 26th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, (Toronto, Canada, 2003), 0-7.

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Impact of Contextual Information for Hypertext Documents Retrieval

Impact of Contextual Information for Hypertext Documents Retrieval Impact of Contextual Informaton for Hypertext ocuments Retreval Idr Chbane and Bch-Lên oan SUPELEC Computer Scence dpt. Plateau de Moulon 3 rue Jolot Cure 9 92 Gf/Yvette France {Idr.Chbane Bch-Len.oan}@supelec.fr

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

An Iterative Implicit Feedback Approach to Personalized Search

An Iterative Implicit Feedback Approach to Personalized Search An Iteratve Implct Feedback Approach to Personalzed Search Yuanhua Lv 1, Le Sun 2, Junln Zhang 2, Jan-Yun Ne 3, Wan Chen 4, and We Zhang 2 1, 2 Insttute of Software, Chnese Academy of Scences, Beng, 100080,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Topology-aware Random Walk

A Topology-aware Random Walk A Topology-aware Random Walk Inkwan Yu, Rchard Newman Dept. of CISE, Unversty of Florda, Ganesvlle, Florda, USA Abstract When a graph can be decomposed nto clusters of well connected subgraphs, t s possble

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base Rankng Technques for Cluster Based Search Results n a Textual Knowledge-base Shefal Sharma Fetch Technologes, Inc 841 Apollo St, El Segundo, CA 90254 +1 (310) 414-9849 ssharma@fetch.com Sofus A. Macskassy

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A Web Site Classification Approach Based On Its Topological Structure

A Web Site Classification Approach Based On Its Topological Structure Internatonal Journal on Asan Language Processng 20 (2):75-86 75 A Web Ste Classfcaton Approach Based On Its Topologcal Structure J-bn Zhang,Zh-mng Xu,Kun-l Xu,Q-shu Pan School of Computer scence and Technology,Harbn

More information

Multi-Model Similarity Propagation and its Application for Web Image Retrieval

Multi-Model Similarity Propagation and its Application for Web Image Retrieval Mult-Model Smlarty Propagaton and ts Applcaton for Web Image Retreval Xn-Jng Wang 1,2 wx01@mals.tsnghua.edu.cn We-Yng Ma 1 wyma@mcrosoft.com Gu-Rong Xue 1,3 Xng L 2 grxue@stu.edu.cn xng@cernet.edu.cn Mcrosoft

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

A Novel Video Retrieval Method Based on Web Community Extraction Using Features of Video Materials

A Novel Video Retrieval Method Based on Web Community Extraction Using Features of Video Materials IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 1961 PAPER Specal Secton on Sgnal Processng A Novel Vdeo Retreval Method Based on Web Communty Extracton Usng Features of Vdeo Materals Yasutaka HATAKEYAMA

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules Effectve Page Recommendaton Algorthms Based on Dstrbuted Learnng Automata and Weghted Assocaton Rules R. Forsat 1*, M. R. Meybod 2 1 Department of Computer Engneerng, Islamc Azad Unversty, Karaj Branch,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Semantic Illustration Retrieval for Very Large Data Set

Semantic Illustration Retrieval for Very Large Data Set Semantc Illustraton Retreval for Very Large Data Set Song Ka, Huang Te-Jun, Tan Yong-Hong Dgtal Meda Lab, Insttute of Computng Technology, Chnese Academy of Scences Beng, 00080, R Chna Insttute for Dgtal

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University Approxmate All-Pars shortest paths Approxmate dstance oracles Spanners and Emulators Ur Zwck Tel Avv Unversty Summer School on Shortest Paths (PATH05 DIKU, Unversty of Copenhagen All-Pars Shortest Paths

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c Advanced Materals Research Onlne: 03-06-3 ISSN: 66-8985, Vol. 705, pp 40-44 do:0.408/www.scentfc.net/amr.705.40 03 Trans Tech Publcatons, Swtzerland Fnte Element Analyss of Rubber Sealng Rng Reslence Behavor

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

Manifold-Ranking Based Keyword Propagation for Image Retrieval * Manfold-Rankng Based Keyword Propagaton for Image Retreval * Hanghang Tong,, Jngru He,, Mngjng L 2, We-Yng Ma 2, Hong-Jang Zhang 2 and Changshu Zhang 3,3 Department of Automaton, Tsnghua Unversty, Bejng

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE 1 TAO LIU, 2 JI-JUN XU 1 College of Informaton Scence and Technology, Zhengzhou Normal Unversty, Chna 2 School of Mathematcs

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

A Comparative Evaluation of Different Link Types on Enhancing Document Clustering

A Comparative Evaluation of Different Link Types on Enhancing Document Clustering A Comparatve Evaluaton of Dfferent Lnk Types on Enhancng Document Clusterng ABSTRACT Wth a growng number of works utlzng lnk nformaton n enhancng document clusterng, t becomes necessary to make a comparatve

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information