An Iterative Implicit Feedback Approach to Personalized Search

Size: px

Start display at page:

Download "An Iterative Implicit Feedback Approach to Personalized Search"

Megan Farmer
5 years ago
Views:

1 An Iteratve Implct Feedback Approach to Personalzed Search Yuanhua Lv 1, Le Sun 2, Junln Zhang 2, Jan-Yun Ne 3, Wan Chen 4, and We Zhang 2 1, 2 Insttute of Software, Chnese Academy of Scences, Beng, , Chna 3 Unversty of Montreal, Canada 1 lvyuanhua@gmal.com 2 {sunle, unln01, zhangwe04}@scas.cn 3 4 ne@ro.umontreal.ca chenwan@nus.edu.sg Abstract General nformaton retreval systems are desgned to serve all users wthout consderng ndvdual needs. In ths paper, we propose a novel approach to personalzed search. It can, n a unfed way, explot and utlze mplct feedback nformaton, such as query logs and mmedately vewed documents. Moreover, our approach can mplement result re-rankng and query expanson smultaneously and collaboratvely. Based on ths approach, we develop a clent-sde personalzed web search agent PAIR (Personalzed Assstant for Informaton Retreval), whch supports both Englsh and Chnese. Our experments on TREC and HTRDP collectons clearly show that the new approach s both effectve and effcent. 1 Introducton Analyss suggests that, whle current nformaton retreval systems, e.g., web search engnes, do a good ob of retrevng results to satsfy the range of ntents people have, they are not so well n dscernng ndvduals search goals (J. Teevan et al., 2005). Search engnes encounter problems such as query ambguty and results ordered by popularty rather than relevance to the user s ndvdual needs. To overcome the above problems, there have been many attempts to mprove retreval accuracy based on personalzed nformaton. Relevance Feedback (G. Salton and C. Buckley, 1990) s the man post-query method for automatcally mprovng a system s accuracy of a user s ndvdual need. The technque reles on explct relevance assessments (.e. ndcatons of whch documents contan relevant nformaton). Relevance feedback has been proved to be qute effectve for mprovng retreval accuracy (G. Salton and C. Buckley, 1990; J. J. Roccho, 1971). However, searchers may be unwllng to provde relevance nformaton through explctly markng relevant documents (M. Beauleu and S. Jones, 1998). Implct Feedback, n whch an IR system unobtrusvely montors search behavor, removes the need for the searcher to explctly ndcate whch documents are relevant (M. Morta and Y. Shnoda, 1994). The technque uses mplct relevance ndcatons, although not beng as accurate as explct feedback, s proved can be an effectve substtute for explct feedback n nteractve nformaton seekng envronments (R. Whte et al., 2002). In ths paper, we utlze the mmedately vewed documents, whch are the clcked results n the same query, as one type of mplct feedback nformaton. Research shows that relatve preferences derved from mmedately vewed documents are reasonably accurate on average (T. Joachms et al., 2005). Another type of mplct feedback nformaton that we explot s users query logs. Anyone who uses search engnes has accumulated lots of clck through data, from whch we can know what queres were, when queres occurred, and whch search results were selected to vew. These query logs provde valuable nformaton to capture users nterests and preferences. Both types of mplct feedback nformaton above can be utlzed to do result re-rankng and query expanson, (J. Teevan et al., 2005; Xuehua Shen. et al., 2005) whch are the two general approaches to personalzed search. (J. Ptkow et al., 2002) However, to the best of our knowledge, how to explot these two types of mplct feedback n a unfed way, whch not only brngs collaboraton between query expanson and result re-rankng but also makes the whole system more concse, has so far not been well studed n the prevous work. In ths paper, we adopt HITS algorthm (J. Klenberg, 1998), and propose a 585 Proceedngs of the 21st Internatonal Conference on Computatonal Lngustcs and 44th Annual Meetng of the ACL, pages , Sydney, July c 2006 Assocaton for Computatonal Lngustcs

2 HITS-lke teratve approach addressng such a problem. Our work dffers from exstng work n several aspects: (1) We propose a HITS-lke teratve approach to personalzed search, based on whch, mplct feedback nformaton, ncludng mmedately vewed documents and query logs, can be utlzed n a unfed way. (2) We mplement result re-rankng and query expanson smultaneously and collaboratvely trggered by every clck. (3) We develop and evaluate a clent-sde personalzed web search agent PAIR, whch supports both Englsh and Chnese. The remanng of ths paper s organzed as follows. Secton 2 descrbes our novel approach for personalzed search. Secton 3 provdes the archtecture of PAIR system and some specfc technques. Secton 4 presents the detals of the experment. Secton 5 dscusses the prevous work related to our approach. Secton 6 draws some conclusons of our work. 2 Iteratve Implct Feedback Approach We propose a HITS-lke teratve approach for personalzed search. HITS (Hyperlnk-Induced Topc Search) algorthm, frst descrbed by (J. Klenberg, 1998), was orgnally used for the detecton of hgh-score hub and authorty web pages. The Authorty pages are the central web pages n the context of partcular query topcs. The strongest authorty pages conscously do not lnk one another 1 they can only be lnked by some relatvely anonymous hub pages. The mutual renforcement prncple of HITS states that a web page s a good authorty page f t s lnked by many good hub pages, and that a web page s a good hub page f t lnks many good authorty pages. A drected graph s constructed, of whch the nodes represent web pages and the drected edges represent hyperlnks. After teratvely computng based on the renforcement prncple, each node gets an authorty score and a hub score. In our approach, we explot the relatonshps between documents and terms n a smlar way to HITS. Unseen search results, those results whch are retreved from search engne yet not been presented to the user, are consdered as authorty pages. Representatve terms are consdered as hub pages. Here the representatve terms are the terms extracted from and best representng the mplct feedback nformaton. Representatve terms confer a relevance score to the unseen 1 For nstance, There s hardly any other company s Web page lnked from search results specfcally, the unseen search results, whch contan more good representatve terms, have a hgher possblty of beng relevant; the representatve terms should be more representatve, f they occur n the unseen search results that are more lkely to be relevant. Thus, also there s mutual renforcement prncple exstng between representatve terms and unseen search results. By the same token, we constructed a drected graph, of whch the nodes ndcate unseen search results and representatve terms, and the drected edges represent the occurrence of the representatve terms n the unseen search results. The followng Table 1 shows how our approach corresponds to HITS algorthm. Approaches The Drected Graph Nodes Edges HITS Authorty Pages Hub Pages Hyperlnks Our Unseen Search Representatve Occurrence Approach Results Terms 2 Table 1. Our approach versus HITS. Because we have already known that the representatve terms are hub pages, and that the unseen search results are authorty pages, wth respect to the former, only hub scores need to be computed; wth respect to the latter, only authorty scores need to be computed. Fnally, after teratvely computng based on the mutual renforcement prncple we can re-rank the unseen search results accordng to ther authorty scores, as well as select the representatve terms wth hghest hub scores to expand the query. Below we present how to construct a drected graph to begn wth. 2.1 Constructng a Drected Graph We can vew the unseen search results and the representatve terms as a drected graph G = (V, E). A sample drected graph s shown n Fgure 1: Fgure 1. A sample drected graph. The nodes V correspond to the unseen search results (the rectangles n Fgure 1) and the repre- 2 The occurrence of the representatve terms n the unseen search results. 586

3 sentatve terms (the crcles n Fgure 1); a drected edge p q E s weghed by the frequency of the occurrence of a representatve term p n an unseen search result q (e.g., the number put on the edge t 1 r 2 ndcates that t 1 occurs twce n r 2 ). We say that each representatve term only has an out-degree whch s the number of the unseen search results t occurs n, as well as that each unseen search result only has an n-degree whch s the count of the representatve terms t contans. Based on ths, we assume that the unseen search results and the representatve terms respectvely correspond to the authorty pages and the hub pages ths assumpton s used throughout the proposed algorthm. 2.2 A HITS-lke Iteratve Algorthm In ths secton, we present how to ntalze the drected graph and how to teratvely compute the authorty scores and the hub scores. And then accordng to these scores, we show how to re-rank the unseen search results and expand the ntal query. Intally, each unseen search result of the query are consdered equally authortatve, that s, = = = 1 Y (1) y y y 1 2 Y Where vector Y ndcates authorty scores of the overall unseen search results, and Y s the sze of such a vector. Meanwhle, each representatve term, wth the term frequency tf n the hstory query logs that have been udged related to the current query, obtans ts hub score accordng to the follow formulaton: x 0 tf X = 1 tf = (2) Where vector X ndcates hub scores of the overall representatve terms, and X s the sze of the vector X. The nodes of the drected graph are ntalzed n ths way. Next, we assocate each edge wth a weght: w( ) = (3) t r tf, Where tf, ndcates the term frequency of the representatve term t occurrng n the unseen search result r ; w(t r ) s the weght of edge that lnk from t to r. For nstance, n Fgure 1, w(t 1 r 2 ) = 2. After ntalzaton, the teratvely computng of hub scores and authorty scores starts. The hub score of each representatve term s re-computed based on three factors: the authorty scores of each unseen search result where ths term occurs; the occurrng frequency of ths term n each unseen search result; the total occurrence of every representatve term n each unseen search result. The formulaton for re-computng hub scores s as follows: ( 1) ( ) ' k + k w t r (4) x = y : t w( ) r t n r n: t n r Where x`(k+1) s the hub score of a representatve term t after (k+1)th teraton; y k s the authorty score of an unseen search result r after kth teraton; : t r ndcates the set of all unseen search results those t occurs n; n: t n r ndcates the set of all representatve terms those r contans. The authorty score of each unseen search result s also re-computed relyng on three factors: the hub scores of each representatve term that ths search result contans; the occurrng frequency of each representatve term n ths search result; the total occurrence of each representatve term n every unseen search results. The formulaton for re-computng authorty scores s as follows: ( 1) ( ) ' k + k w t r (5) y = x : t w( ) r t r m m: t r m Where y`(k+1) s the authorty score of an unseen k search result r after (k+1)th teraton; x s the hub score of a representatve term t after kth teraton; : t r ndcates the set of all representatve terms those r contans; m: t r m ndcates the set of all unseen search results those t occurs n. After re-computaton, the hub scores and the authorty scores are normalzed to 1. The formulaton for normalzaton s as follows: y y ' and Y X y ' x = = x' x' k k= 1 k= 1 k (6) The teraton, ncludng re-computaton and normalzaton, s repeated untl the changes of the hub scores and the authorty scores are smaller than some predefned threshold θ (e.g ). Specfcally, after each repetton, the changes n authorty scores and hub scores are computed usng the followng formulaton: Y ( 1) ( k k x ) 2 ( ( k 1) k ) c y y x x = + = 1 = 1 (7) The teraton stops f c<θ. Moreover, the teraton wll also stop f repetton has reached a 587

4 predefned tmes k (e.g. 30). The procedure of the teraton s shown n Fgure 2. As soon as the teraton stops, the top n unseen search results wth hghest authorty scores are selected and recommended to the user; the top m representatve terms wth hghest hub scores are selected to expand the orgnal query. Here n s a predefned number (n PAIR system we set n=3, n s gven a small number because usng mplct feedback nformaton s sometmes rsky.) m s determned accordng to the poston of the bggest gap, that s, f t t +1 s bgger than the gap of any other two neghborng ones of the top half representatve terms, then m s gven a value. Furthermore, some of these representatve terms (e.g. top 50% hgh score terms) wll be agan used n the next tme of mplementng the teratve algorthm together wth some newly ncomng terms extracted from the ust now clck. Iterate (T, R, k, θ) T: a collecton of m terms R: a collecton of n search results k: a natural number θ: a predefned threshold Apply (1) to ntalze Y. Apply (2) to ntalze X. Apply (3) to ntalze W. For = 1, 2, k Apply (4) to (X -1, Y -1 ) and obtan X`. Apply (5) to (X -1, Y -1 ) and obtan Y`. Apply (6) to Normalze X` and Y`, and respectvely obtan X and Y. Apply (7) and obtan c. If c<θ, then break. End Return (X, Y). Fgure 2. The HITS-lke teratve algorthm. 3 Implementaton 3.1 System Desgn In ths secton, we present our expermental system PAIR, whch s an IE Browser Helper Obect (BHO) based on the popular Web search engne Google. PAIR has three man modules: Result Retreval module, User Interactons module, and Iteratve Algorthm module. The archtecture s shown n Fgure 3. The Result Retreval module runs n backgrounds and retreves results from search engne. When the query has been expanded, ths module wll use the new keywords to contnue retrevng. The User Interactons module can handle three types of basc user actons: (1) submttng a query; (2) clckng to vew a search result; (3) clckng the Next Page lnk. For each of these actons, the system responds wth: (a) explotng and extractng representatve terms from mplct feedback nformaton; (b) fetchng the unseen search results va Results Retreval module; (c) sendng the representatve terms and the unseen search results to Iteratve Algorthm module. Fgure 3. The archtecture of PAIR. The Iteratve Algorthm module mplements the HITS-lke algorthm descrbed n secton 2. When ths module receves data from User Interactons module, t responds wth: (a) teratvely computng the hub scores and authorty scores; (b) re-rankng the unseen search results and expandng the orgnal query. Some specfc technques for capturng and explotng mplct feedback nformaton are descrbed n the followng sectons. 3.2 Extract Representatve Terms from Query Logs We udge whether a query log s related to the current query based on the smlarty between the query log and the current query text. Here the query log s assocated wth all documents that the user has selected to vew. The form of each query log s as follows <query text><query tme> [clcked documents]* The clcked documents consst of URL, ttle and snppet of every clcked document. The reason why we utlze the query text of the current query but not the search results (ncludng ttle, snppet, etc.) to compute the smlarty, s out of consderaton for effcency. If we had used the search results to determne the smlarty, the computaton could only start once the search engne has returned the search results. In our method, nstead, we can explot query logs whle search engne s dong retrevng. Notce that although our system only utlzes the query logs n the last 24 hours; n practce, we can explot much more because of ts low computaton cost wth respect to the retreval process performed n parallel. 588

1 2 3 4 5 6 7 8 9 10 Google result query = aguar www.aguar.com/ CA - Cars www.aguar.com/ca/en/ Cars www.aguarcars.com/ Apple - Mac OS X www.apple.com/macosx/ Apple - Support www.apple.com/support/.

5 Google result query = aguar CA - Cars Cars Apple - Mac OS X Apple - Support UK - Cars UK - R s for dspace.dal.ppex.com/ Schrödnger -> Home Schrödnger -> Ste Map query = aguar After the 4 th result beng clcked CA - Cars Cars Apple - Mac OS X Amazon.com: Mac OS X Mac OS X 10.2 arstechnca.com/revews/os Macworld: News: Macworld maccentral.macworld.com/news/ Apple - Support -3 UK - Cars -3 UK - R s for -3 PAIR result query = aguar car query logs UK - Cars UK - R s for CA - Cars -2 Cars -2 Apple - Mac OS X -2 Apple - Support -2 dspace.dal.ppex.com/ Schrödnger -> Home Schrödnger -> Ste Map Table 2. Sample results of re-rankng. The search results n boldface are the ones that our system recommends to the user. -3 and -2 n the rght sde of some results ndcate the how ther ranks descend. We use the standard vector space retreval model (G. Salton and M. J. McGll, 1983) to compute the smlarty. If the smlarty between any query log and the current query exceeds a predefned threshold, the query log wll be consdered to be related to current query. Our system wll attempt to extract some (e.g. 30%) representatve terms from such related query logs accordng to the weghts computed by applyng the followng formulaton: w( t ) = tf df (8) Where tf and df respectvely are the term frequency and nverse document frequency of t n the clcked documents of a related query log. Ths formulaton means that a term s more representatve f t has a hgher frequency as well as a broader dstrbuton n the related query log. 3.3 Extract Representatve Terms from Immedately Vewed Documents The representatve terms extracted from mmedately vewed documents are determned based on three factors: term frequency n the mmedately vewed document, nverse document frequency n the entre seen search results, and a dscrmnant value. The formulaton s as follows: ( ) d r d N w x = ( ) tf df d x x x (9) Where tf dr x s the term frequency of term x n the vewed results set d r ; tf dr x s the nverse document frequency of x n the entre seen results set d N. And the dscrmnant value d(x ) of x s computed usng the weghtng schemes F2 (S. E. Robertson and K. Sparck Jones, 1976) as follows: rr d( x ) ln = (10) ( n r ) ( N R ) Where r s the number of the mmedately vewed documents contanng term x ; n s the number of the seen results contanng term x ; R s the number of the mmedately vewed documents n the query; N s the number of the entre seen results. 3.4 Sample Results Unlke other systems whch do result re-rankng and query expanson respectvely n dfferent ways, our system mplements these two functons smultaneously and collaboratvely Query expanson provdes dversfed search results whch must rely on the use of re-rankng to be moved forward and recommended to the user. Fgure 4. A screen shot for query expanson. After teratvely computng usng our approach, the system selects some search results wth top hghest authorty scores and recommends them to the user. In Table 2, we show that PAIR successfully re-ranks the unseen search results of aguar respectvely usng the mmedately 589

are recommended to the user under the help of re-rankng, as shown n Fgure 4. 4 Experment 4.

6 vewed documents and the query logs. Smultaneously, some representatve terms are selected to expand the orgnal query. In the query of aguar (wthout query logs), we clck some results about Mac OS, and then we see that a term Mac has been selected to expand the orgnal query, and some results of the new query aguar Mac are recommended to the user under the help of re-rankng, as shown n Fgure 4. 4 Experment 4.1 Expermental Methodology It s a challenge to quanttatvely evaluate the potental performance mprovement of the proposed approach over Google n an unbased way (D. Hawkng et al., 1999; Xuehua Shen et al., 2005). Here, we adopt a smlar quanttatve evaluaton as what Xuehua Shen et al. (2005) do to evaluate our system PAIR and recrut 9 students who have dfferent backgrounds to partcpate n our experment. We use query topcs from TREC 2005 and 2004 Hard Track, TREC 2004 Terabyte track for Englsh nformaton retreval, 3 and use query topcs from HTRDP 2005 Evaluaton for Chnese nformaton retreval. 4 The reason why we utlze multple TREC tasks rather than usng a sngle one s that more queres are more lkely to cover the most nterestng topcs for each partcpant. Intally, each partcpant would freely choose some topcs (typcally 5 TREC topcs and 5 HTRDP topcs). Each query of TREC topcs wll be submtted to three systems: UCAIR 5 (Xuehua Shen et al., 2005), PAIR No QE (PAIR system of whch the query expanson functon s blocked) and PAIR. Each query of HTRDP topcs needs only to be submtted to PAIR No QE and PAIR. We do not evaluate UCAIR usng HTRDP topcs, snce t does not support Chnese. For each query topc, the partcpants use the ttle of the topc as the ntal keyword to begn wth. Also they can form some other keywords by themselves f the ttle alone fals to descrbe some detals of the topc. There s no lmt on how many queres they must submt. Durng each query process, the partcpant may clck to vew some results, ust as n normal web search. Then, at the end of each query, search results from these dfferent systems are randomly and anonymously mxed together so that every par- 3 Text REtreval Conference HTRDP Evaluaton. 5 The latest verson released on November 11, tcpant would not know where a result comes from. The partcpants would udge whch of these results are relevant. At last, we respectvely measure precson at top 5, top 10, top 20 and top 30 documents of these system. 4.2 Results and Analyss Altogether, 45 TREC topcs (62 queres n all) are chosen for Englsh nformaton retreval. 712 documents are udged as relevant from Google search results. The correspondng number of relevant documents from UCAIR, PAIR No QE and PAIR respectvely s: 921, 891 and Fgure 5 shows the average precson of these four systems at top n documents among such 45 TREC topcs. Fgure 5. Average precson for TREC topcs. 45 HTRDP topcs (66 queres n all) are chosen for Chnese nformaton retreval. 809 documents are udged as relevant from Google search results. The correspondng number of relevant documents from PAIR No QE and PAIR respectvely s: 1198 and Fgure 6 shows the average precson of these three systems at top n documents among such 45 HTRDP topcs. Fgure 6. Average precson for HTRDP topcs. PAIR and PAIR No QE versus Google We can see clearly from Fgure 5 and Fgure 6 that the precson of PAIR s mproved a lot comparng wth that of Google n all measure- 590

7 ments. Moreover, the mprovement scale ncreases from precson at top 10 to that of top 30. One explanaton for ths s that the more mplct feedback nformaton generated, the more representatve terms can be obtaned, and thus, the teratve algorthm can perform better, leadng to more precse search results. PAIR No QE also sgnfcantly outperforms Google n these measurements, however, wth query expanson, PAIR can perform even better. Thus, we say that result re-rankng and query expanson both play an mportant role n PAIR. Comparng Fgure 5 wth Fgure 6, one can see that the mprovement of PAIR versus Google n Chnese IR s even larger than that of Englsh IR. One explanaton for ths s that: before mplementng the teratve algorthm, each Chnese search result, ncludng ttle and snppet, s segmented nto words (or phrases). And only the noun, verb and adectve of these words (or phrases) are used n next stages, whereas, we only remove the stop words for Englsh search result. Another explanaton s that there are some Chnese web pages wth the same content. If one of such pages s clcked, then, occasonally some repetton pages are recommended to the user. However, snce PAIR s based on the search results of Google and the nformaton concernng the result pages that PAIR can obtaned s lmted, whch leads to t dffcult to avod the replcatons. PAIR and PAIR No QE versus UCAIR In Fgure 5, we can see that the precson of PAIR No QE s better than that of UCAIR among top 5 and top 10 documents, and s almost the same as that of UCAIR among top 20 and top 30 documents. However, PAIR s much better than UCAIR n all measurements. Ths ndcates that result re-rankng fals to do ts best wthout query expanson, snce the relevant documents n orgnal query are lmted, and only the re-rankng method alone cannot solve the relevant documents sparseness problem. Thus, the query expanson method, whch can provde fresh and relevant documents, can help the re-rankng method to reach an even better performance. Effcency of PAIR The teraton statstc n evaluaton ndcates that the average teraton tmes of our approach s 22 before convergence on condton that we set the threshold θ = The experment shows that the computaton tme of the proposed approach s mperceptble for users (less than 1ms.) 5 Related Work There have been many pror attempts to personalzed search. In ths paper, we focus on the related work dong personalzed search based on mplct feedback nformaton. Some of the exstng studes capture users nformaton need by explotng query logs. For example, M. Speretta and S. Gauch (2005) buld user profles based on actvty at the search ste and study the use of these profles to provde personalzed search results. F. Lu et al. (2002) learn user's favorte categores from hs query hstory. Ther system maps the nput query to a set of nterestng categores based on the user profle and confnes the search doman to these categores. Some studes mprove retreval performance by explotng users browsng hstory (F. Tanudaa and L. Mu, 2002; M. Morta and Y. Shnoda, 1994) or Web communtes (A. Krtkopoulos and M. Sder, 2003; K. Sugyama et al., 2004) Some studes utlze clent sde nteractons, for example, K. Bharat (2000) automatcally dscovers related materal on behalf of the user by servng as an ntermedary between the user and nformaton retreval systems. Hs system observes users nteractng wth everyday applcatons and then antcpates ther nformaton needs usng a model of the task at hand. Some latest studes combne several types of mplct feedback nformaton. J. Teevan et al. (2005) explore rch models of user nterests, whch are bult from both search-related nformaton, such as prevously ssued queres and prevously vsted Web pages, and other nformaton about the user such as documents and emal the user has read and created. Ths nformaton s used to re-rank Web search results wthn a relevance feedback framework. Our work s partly nspred by the study of Xuehua Shen et al. (2005), whch s closely related to ours n that they also explot mmedately vewed documents and short-term hstory queres, mplement query expanson and re-rankng, and develop a clent-sde web search agents that perform eager mplct feedback. However, ther work dffers from ours n three ways: Frst, they use the cosne smlarty to mplement query expanson, and use Roccho formulaton (J. J. Roccho, 1971) to re-rank the search results. Thus, ther query expanson and re-rankng are computed separately and are not so concse and collaboratve. Secondly, ther query expanson s based only on the past queres and s mplemented before the query, whch leads to that 591

8 ther query expanson does not beneft from user s clck through data. Thrdly, they do not compute the relevance of search results and the relatvty of expanded terms n an teratve fashon. Thus, ther approach does not utlze the relaton among search results, among expanded terms, and between search results and expanded terms. 6 Conclusons In ths paper, we studed how to explot mplct feedback nformaton to mprove retreval accuracy. Unlke most prevous work, we propose a novel HITS-lke teratve algorthm that can make use of query logs and mmedately vewed documents n a unfed way, whch not only brngs collaboraton between query expanson and result re-rankng but also makes the whole system more concse. We further propose some specfc technques to capture and explot these two types of mplct feedback nformaton. Usng these technques, we develop a clent-sde web search agent PAIR. Experments n Englsh and Chnese collectons show that our approach s both effectve and effcent. However, there s stll room to mprove the performance of the proposed approach, such as explotng other types of personalzed nformaton, choosng some more effectve strateges to extract representatve terms, studyng the effects of the parameters used n the approach, etc. Acknowledgement We would lke to thank the anonymous revewers for ther helpful feedback and correctons, and to the nne partcpants of our evaluaton experments. Addtonally, ths work s supported by the Natonal Scence Fund of Chna under contact References A. Krtkopoulos and M. Sder, The Compass Flter: Search engne result personalzaton usng Web communtes. In Proceedngs of ITWP, pages D. Hawkng, N. Craswell, P.B. Thstlewate, and D. Harman, Results and challenges n web search evaluaton. Computer Networks, 31(11-16): F. Lu, C. Yu, and W. Meng, Personalzed web search by mappng user queres to categores. In Proceedngs of CIKM, pages F. Tanudaa and L. Mu, Persona: a contextualzed and personalzed web search. HICSS. G. Salton and M. J. McGll, Introducton to Modern Informaton Retreval. McGraw-Hll. G. Salton and C. Buckley, Improvng retreval performance by relevance feedback. Journal of the Amercan Socety for Informaton Scence, 41(4): J. J. Roccho, Relevance feedback n nformaton retreval. In The SMART Retreval System : Experments n Automatc Document Processng, pages Prentce-Hall Inc. J. Klenberg, Authortatve sources n a hyperlnked envronment. ACM, 46(5): J. Ptkow, H. Schutze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel, Personalzed search. Communcatons of the ACM, 45(9): J. Teevan, S. T. Dumas, and E. Horvtz, Personalzng search va automated analyss of nterests and actvtes. In Proceedngs of SIGIR, pages K. Bharat, SearchPad: Explct capture of search context to support Web search. Computer Networks, 33(1-6): K. Sugyama, K. Hatano, and M. Yoshkawa, Adaptve Web search based on user profle constructed wthout any effort from user. In Proceedngs of WWW, pages M. Beauleu and S. Jones, Interactve searchng and nterface ssues n the okap best match retreval system. Interactng wth Computers, 10(3): M. Morta and Y. Shnoda, Informaton flterng based on user behavor analyss and best match text retreval. In Proceedngs of SIGIR, pages M. Speretta and S. Gauch, Personalzng search based on user search hstory. Web Intellgence, pages R. Whte, I. Ruthven, and J. M. Jose, The use of mplct evdence for relevance feedback n web retreval. In Proceedngs of ECIR, pages S. E. Robertson and K. Sparck Jones, Relevance weghtng of search terms. Journal of the Amercan Socety for Informaton Scence, 27(3): T. Joachms, L. Granka, B. Pang, H. Hembrooke, and G. Gay, Accurately Interpretng Clckthrough Data as Implct Feedback, In Proceedngs of SIGIR, pages Xuehua Shen, Bn Tan, and Chengxang Zha, Implct User Modelng for Personalzed Search. In Proceedngs of CIKM, pages

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan