A new query expansion method based on query logs mining1

Size: px

Start display at page:

Download "A new query expansion method based on query logs mining1"

Chloe Robyn McLaughlin
6 years ago
Views:

1 Internatonal Journal on Asan Language Processng, 19 (1): A new query expanson method based on query logs mnng1 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao School of Computer Scence and Technology, Harbn Insttute of Technology, Harbn , Chna Emal:{pzhu, wangxl, yclu}@nsun.ht.edu.cn Abstract: Query expanson has long been suggested as an effectve way to mprove the performance of nformaton retreval systems by addng addtonal relevant terms to the orgnal queres. However, most prevous research has been lmted n extractng new terms from a subset of relevant documents, but has not exploted the nformaton about user nteractons. In ths paper, we proposed a method for automatc query expanson based on user nteractons recorded n query logs. The central dea s to extract correlatons among queres by analyzng the common documents the users selected for them, and the expanded terms only come from the assocated queres more than the relevant documents. In partcular, we argue that queres should be dealt wth n dfferent ways accordng to ther ambguty degrees, whch can be calculated from the log nformaton. We verfy ths method n a large scale query logs collecton and the expermental results show that the method maes good use of the nowledge of user nteractons, and t can remarably mprove search performance. Keywords: Query expanson, log mnng, nformaton retreval, search engne, 1. Introducton Wth the rapd growth of nformaton on the World Wde Web, more and more users need search engne technology to help them explot such an extremely valuable resource. Although many search engne systems have been successfully deployed, the current search systems are stll far from optmal because of usng smple eywords to search and ran relevant documents. A well-nown lmtaton of current search engne systems s the dffculty of dealng wth synonymy (dfferent words for descrbng the same thngs) and 1 Supported by Natonal Natural Scence Foundaton of Chna ( , ) and The Natonal Hgh Technology Research and Development Program of Chna (2006AA01Z197, 2007AA01Z172)

2 2 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao polysemy (same word to descrbe dfferent thngs). For example, a farmer may use query 苹果 to get relevant nformaton about the frut, whle computer lovers may use the same query to fnd related results of ths brand computer. When such a query s ssued, t s dffcult for search engne system to choose whch nformaton he/she wshes to get. Another problem of search engnes s that web users typcally submt very short queres to search engnes and the average length of web queres s less than two words (Wen J. R. 2001). Short queres do not provde suffcent ndcatons for an effectve selecton of relevant documents and thus negatvely affect the performance of web search n terms of both precson and recall. To overcome the above problems, researchers have focused on usng query expanson technques to help users formulate a better query. Query expanson s a method for mprovng the effectveness of nformaton retreval through the reformulaton of queres by provdng addtonal contextual nformaton to the orgnal queres. It has been shown to perform very well over large data sets, especally wth short nput queres (Kraft R. 2004, Carmel D. 2002). However, prevous query expanson methods have been lmted n extractng expanson terms from a subset of documents, but have not exploted the nformaton about user nteractons. Anyone who uses search engnes has accumulated lots of clc through data, from whch we can now what queres have been used to retreve what documents. These query logs provde valuable nformaton to extract relatonshps between queres and documents, and whch can be used n query expanson. Another problem of current query expanson s that most proposed methods are unformly appled to all queres. In fact, we thn that queres should not be handled n the same manner because we fnd that there s no need for expanson on some queres. Ths has also been found n (Dou Z. C. 2007). For example on the query Google, almost all of the users are consstently selectng results to redrect to Google s homepage, and therefore none of the expanson strateges could provde sgnfcant benefts to users. In ths paper, we suggest a new query expanson method based on the analyss of user logs. By consderng f queres should be expanded and mnng correlatons among user queres from user logs, our query expanson method can acheve sgnfcant mprovements n retreval effectveness compared to current query expanson technques. The remnder of ths paper s structured as follows. Secton 2 s a dscusson of prevous wors for query expanson method. Secton 3 ntroduces a whole procedure of our query expanson method step by step. Secton 4 shows emprcal evdence of the effectveness of our method and nvestgates the expermental results more n detal. Fnally, Secton 5 summarzes our fndngs. 2. Query Expanson Based on Relevance feedbac There have been many pror attempts on query expanson. In ths paper, we focus on the related wor dong query expanson based on relevance feedbac (Roccho J. 1971, Salton G. 1990) nformaton. In ths approach, the results returned for the ntal query wll be mared as relevant or rrelevant accordng to user s nformaton need, expanson terms can be extracted from the relevant documents. Frst approaches were explct (Roccho J. 1971, Oabe M. 2005) n the sense that the user was the one choosng the relevant results, and then varous methods were appled to extract new terms related to the query and the selected documents. Unfortunately, n a real search context, users usually are reluctant to mae the extra effort to provde such relevance feedbac nformaton (Kelly D. 2003). To overcome the dffculty due to the lac of suffcent relevance udgments, an automatc feedbac technque called pseudo-relevance feedbac (also nown as blnd feedbac) s

3 A New Query Expanson Method Based on Query Logs Mnng 3 commonly used. Ths method made a conecture that, n the absence of any other relevance udgment, the top few documents retreved on the bass of an ntal query are relevant (Attar L. 1977, Croft W.B. 1979). Expanson terms are extracted from the top-raned documents to formulate a new query for a second cycle retreval (Lam-Adesna M. 2001, Carpneto C. 2001). However, the method of pseudo-relevance feedbac s hghly dependent on the qualty of the documents retreved n the ntal retreval. In cases where the top raned documents retreved have lttle relevance to the query, ths method wll not wor well and t may even ntroduce rrelevant terms nto the queston and degrade the performance. Another group of relevance feedbac technque s mplct feedbac, n whch an IR system can mae nferences about relevance from searcher nteracton, removes the need for the users to explctly ndcate whch documents are relevant (Kelly D. 2003, Morta M. 1994). Several prevous studes have shown that mplct nformaton may be helpful for nferrng user nformaton need and can mprove retreval accuracy through query expanson. Some query expanson methods based on mplct feedbac have been proposed n (Cu H. 2003, Lv Y. H. 2006), the mplct nformaton they used s clc-through data collected over a long tme perod n query logs. These query logs provde valuable ndcatons to understand the nds of documents the users ntend to retreve by formulatng a query wth a set of partcular terms, and expanson terms can be selected from the sets or the results of past queres. One mportant assumpton behnd these methods s that the clced documents are relevant to the query. Ths presumpton s not always rght. However, although the clcng nformaton s not as accurate as explct relevance udgment, the user's choce does suggest a certan degree of relevance. In fact, users usually do not mae the choce randomly. Even f some of the document clcs are erroneous, we can expect that most users do clc on documents that are relevant. Some prevous wor on usng query logs also strongly supports ths assumpton (Bar-Yossef 2008, Wen 2002, Bllerbec 2003 and Zhang 2006). Therefore, query logs can be taen as a very valuable resource contanng abundant relevance feedbac data. In ths paper, we present a new query expanson method based on query logs mnng, at the same tme, n order to avod the problem of query drft, we utlze clced results of the present search process as another type of mplct feedbac nformaton to deduce users nformaton need. Our wor dffers from the exstng ones n two mportant aspects. Frst, we ntroduce a method to evaluate the qualty of user queres, whch can be measured by the calculaton of Kullbac-Lebler Dstance (Cover T. 1991) among documents n query logs. Query expanson can strongly mprove the performance of short queres and ambguous queres. But ths technque can not acheve the same goal on an accurate query; some new added terms wll ntroduce the problem of query drft and degrade the performance. So, we beleve that queres should not be dealt wth n the same way and measurement of query qualty s essental to udge f a query need to be expanded, whch has never been researched before. Second, we propose a new query expanson method based on query logs, relevant expanson terms are selected from the past queres wth the analyss of relaton between queres and documents under the language modelng framewor. Comparng to the exstng wor, the dfference s that we extract the terms from the past queres more than the relevant documents, the experments show that our method gets better performance n some aspects. 3. Query Expanson Method Based on Logs mnng The query expanson method based on logs mnng presented n ths paper s composed of

4 4 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao two parts: measurement of query qualty wth ambguty analyss and terms expanson wth query log mnng. In ths secton, the detals of these two parts are descrbed. 3.1 Measurement of Query qualty wth Ambguty analyss A good query should be general enough to cover all relevant documents and specfc enough to select only relevant ones. But ths rule can not be used to evaluate the qualty of user queres because the relevant documents are unnown n advance. In fact, many of the queres need to be expanded for ther ambguty, such as the query of 苹果 mentoned above. In ths study, we proposed a new method to measure qualty of a query based on the calculaton of ts ambguty degree, and query logs are adopted as the data resource. In query logs, the orgnal form of each clc-through record s descrbed as: record =< : sesson _ d >< query _ text >< ran >< order >< page _ url > Sesson _ d s a unque value assgned by the search engne to dentfy a query tas, ran s the document order n all returned results, and order s the order n clced documents. Our frst tas s to extract query sessons from the orgnal log data. A query sesson s formed by the records wth the same sesson d, whch can be defned as follows: sesson =< : query _ text > [ clced documents ] Each sesson contans one query and a set of documents whch the user clced on. Because most of queres are repeated, that means one query _ text can correspond to one or more sessons. The central dea of our method s that, for the same query, the clced documents n the same sesson should be related wth each other and smlar n content, but those n the dfferent sessons are not necessarly related. For example, the clced documents of query mouse may be about rodents or computer devces, these two types of documents are not related for the query ambguty. So the content dfferences of the clced documents among the query sessons can be used to measure the ambguty degree of a query. In our method, we assume that the clced documents n the same sesson were related, whch can be regarded as to be generated by one language model. The calculaton of ambguty degree can be consdered as an evaluaton of Kullbac-Lebler Dstance (KLD) among these language models. KLD s often used to measure the dvergence of two probablty dstrbutons n Informaton Theory, and t s also can be used to evaluate the rrelevant degree between two language models. Gven a query q, we can get a collecton of sessons from log data denoted by S( q) = { s1, s2, L, s n }, each sesson wll be represented by a sequence of the clced documents, s = { d1, d2, L, dm}. The Inner ambguty degree of a query s IA( q ), then: n 1 KLD( p( s ) p( s )) + KLD( p( s ) p( s )) IA( q) = (1) n( n 1) = 1 2 That s the average dvergence of the sessons. p( s ) s the probablty dstrbuton of the language model whch s used to generate the document set s, and KLD( p( s ) p( s )) s defned as: p( t s ) KLD( p( s ) p( s )) p( t s )log (2) = p ( t s ) t

5 A New Query Expanson Method Based on Query Logs Mnng 5 In order to compute the score of formula (2), we need to be able to estmate the value of p( t s ), whch s the condtonal probablty of occurrence of word t n s. The estmate for p( t s ) s: m p( t s ) = α λ P( t d ) + (1 α) P( t S) (3) = 1 Where α s the nterpolaton weght determned emprcally to smooth the language models, so that non-zero probablty can be assgned to terms that do not appear n a gven document. P( t S ) s the global bacground collecton model. λ s a weghtng parameter determned by the ran of d n the clced documents, and P( t d ) s the maxmum lelhood estmate of the probablty of term t under the term dstrbuton for document d. The values of λ and P( t d ) can be calculated by the followng formulas: 0.5 λ = + n n( n ) (4) tf ( t, d ) P( t d) = d (5) Here tf ( t, d ) s the raw term frequency of term t n document d and d s the total number of terms n the document. We also gve an outer ambguty whch comes from the dea of (Cronen-Townsend S., 2002). They use the concept of clarty score to quantfy the query s ambguty, whch s the relatve entropy between a query language model and the correspondng collecton language model. The outer ambguty of the query can be defned as the recprocal of clarty score : 1 (6) OA( q) = P( t q) P( t q) log 2 t V Pcoll ( t) Accordng to the above formulas, we can compute the ambguty degree for a gven query. A( q) = βoa( q) + (1 β ) IA( q) (7) And β s the adusted parameter. Inner ambguty degree represents the dfference between the related documents of the query. Intutvely, f a query s clear, the clced documents n ts sessons wll be focused on the same topc, and the term dstrbutons on these documents should be approxmately smlar. And outer ambguty degree represents the dfference between the related documents and global documents collecton. Therefore, the ambguty degree of a clear query s smaller than an ambguous one s. In our test, we set β = 0.4, because we thn nner ambguty degree s more mportant for the calculaton. We wll normalze the value of ambguty degree from 0 to 1, and gve a max length of query expanson, namedθ, and use A( q) θ to set the number of query expanson terms. The dea s that f a query s more ambguous, more terms should be added for expanson, and f a query s more clarty, fewer terms should be added n order to avod mportng the rrelevant words. 3.2 Query expanson wth Logs mnng

6 6 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao There are two steps n our approach to expand an ambguous query. The frst step s to get the canddate terms from the assocated queres, and the second step s to determne whch canddate words should be added to the new query. In ths secton, the detal of these two steps wll be descrbed. In the frst step, we wll use the nformaton of clced documents to create the correlatons of the queres. Generally, we assume that a query s relevant wth the documents that the user clced on, and each record of log data suggests such a relatonshp. If two queres are related wth the same clced documents, we beleve these two queres are assocated wth each other n some way, and the terms n the assocated queres can be used as the canddate terms for query expanson. Here, we used the condtonal probablty P( q q ) to calculate the correlaton between q and q. P( q, q ) d (,, ) D P q q d P( q q ) = = P( q ) P( q ) d (, ) (, ) D P q q d P q d = P( q ) Here we support that P( q q, d ) = P( q d ), because the relaton of queres s created by the document, so d separates P( q q ) = q from q, and we get followng formula: = P( q d ) P( d q ) d D P( q d ) P( d q ) P( q ) d D P( q ) In formula (9), P( d q ) s the condtonal probablty when query s q and the clced document s d ; P( q d ) s the condtonal probablty when the clced document s d and the query s q. The two condtonal probablty can be estmated by followng:: f ( q, d ) P( d q ) = (10) f ( q ) (8) (9) f ( q, d ) P( q d ) = (11) f ( d ) f ( q, d ) s used to descrbe the co-occurrence frequency of query q and document d n log data. f ( q ) s the frequency of q n log data. f ( q, d ) s used to descrbe the cooccurrence frequency of query q and document d n log data. f ( d ) s the frequency of d n log data. By the calculaton of the frequency, we can get the collecton of related queres of q, and the terms n the queres can be used for query expanson. The weghts of terms can be calculated by followng formula: P( t q) = P( q q) (12) q s. t. t q

7 A New Query Expanson Method Based on Query Logs Mnng 7 In the second step, we wll sort the expanded terms by ther weghts and the number of the terms wll be set A( q) θ. We set θ = 40 based on experence. The top A( q) θ terms wll be used for query expanson. 4. Evaluatons and Analyss 4.1 Expermental Data and Methodology Due to the characterstcs of our query expanson method, we can not conduct experments on standard test collecton such as the TREC data snce they do not contan user logs that we need. We test our method on a dataset collected from the query logs of Sogou( 搜狗 ) ( search engne. It covers one mouth log data and about 80% of the queres n t contan Chnese words. Approxmately 24 mllon query records and 3 mllon dstnct queres are dentfed. We select two hundred test nput queres randomly accordng to the overall frequency dstrbutons and extract about one mllon query sessons from the log data. Wth respect to documents set, we collect about ten thousand pages from the Internet accordng to the records n query logs to form the test corpus. In ths data set, each document has been retreved and vewed by users wth a certan query, and we can get suffcent clc-through nformaton to expand a query wth our method. In order to demonstrate the effectveness of our method, three experments were carred out. The frst s to nvestgate the correlaton between the query lengths and the ambguty degrees. In the second, we extract ten queres from the queres set and the performance of query expanson on these queres wll be llustrated. At last, the expermental results of our query expanson method wll be compared wth other systems. 4.2 Results proporton of queres Query Length Fg 1. Dstrbuton of query lengths Fgure 1 llustrates the dstrbuton of query lengths accordng to the number of words. In our experment, we notce that 35% of the queres contan only one eyword and 32% of the queres contan two eywords. The average length of all queres s The result

8 8 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao shows that most people le to use short queres to retreve nformaton. We do not select the queres contaned more than 5 words, because these queres are seldom used and we can not get enough log data for calculaton. Fgure 2 llustrates the relaton between the query lengths and the statstcal analyss values of ther ambguty degree. Let the ambguty of query q s a = A( q ),then the average a and the varance σ can be defned as: 1 n a n = 1 a = (13) n = 1 σ = ( a a) n We observe that the average values of short queres are hgher than the ones of long queres. Ths verfes that the short queres are more ambguous than the long queres and the query expanson technque should be appled on short queres more than long queres. The results also approve the effectveness of our method to measure the ambguty of queres. But t should be emphaszed that not all short queres are bad queres. The varance analyss proves that query length s not a better crteron to measure the qualty of queres. The varance s often used to descrbe the devaton of the data from ts mean center. We observe that the varance s larger when query length s 2 and 3, whch means the ambguty values of queres n these two groups mae a greater fluctuaton around ther mean value. 2 (14) average varance Average and Varance Query Length Fg 2. Average and varance analyss of ambguty degree In the second experment, we extract ten queres from queres set whch are shown n Table 1 and each query wll be dvded nto both short and long verson n order to see how query expanson affects retreval results on short queres and long queres. In our experment, the long queres come from the queres whch length s 4 or 5, and short queres only contan one word. After pre-processng documents, ncludng phrasng,

9 A New Query Expanson Method Based on Query Logs Mnng 9 removng stop words and useless characters, we get a thesaurus whch contans about sxty thousand words. The results are the precson-recall performance of these queres whch wll be counted by manual. ID Short Queres Long Queres 1 苹果苹果褐斑病防治 2 成都成都旅游景点 3 足球足球过人技术视频 4 网易网易邮箱申请 5 比尔盖茨比尔盖茨慈善基金 6 经济国际经济形势 7 DNA DNA 提取侦破技术 8 汽车汽车保险计算方法 9 华为华为招聘信息 10 手机手机生产厂家 Table 1. Lst of Queres n Both the Long Query Set and the Short Query Set The retreval results are shown n Table 2. Accordng to the calculaton of ambguty degree, we beleve the queres n Short Queres set are more ambguous than the queres n Long Queres set, so the average precson of Short Queres set should be lower than the one of Long Queres set. Smlar to the retreval process, query expanson s also affected by the ambguty of orgnal queres. Compared wth an accurate query, the query expanson method can acheve a more mprovement on an ambguous one. The results confrm our expectaton ust descrbed. Wthout query expanson, the average precson on Short Queres set s 22.63% whch s lower than 28.80% of Long Queres set. The mprovement ganed wth query expanson on Short Queres set s observably hgher than that obtaned on Long Queres set, and the results show the applcaton of query expanson on Short Queres set s more valuable. Recall Short Queres Wthout QE Short Queres Wth QE Long Queres Wthout QE Long Queres Wth QE (+54.07) (23.01) (+62.39) (35.19) (+74.79) (33.92) (+76.37) (35.97) (+84.07) (+39.35) (+94.73) (+36.83) (+90.65) (+41.88) (+85.28) (+39.47) (+88.75) (+41.99) (+85.44) (+38.51) Average (+74.95) (+34.69) Table 2. Comparson wth and wthout QE on both Long Query Set and Short Query Set

10 10 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao The results n Table 2 also prove that query expanson technque can not acheve the same performance on the accurate queres compared wth the ambguous ones; some new added terms wll ntroduce the problem of query drft and degrade the performance. In order to evaluate our query expanson method, we wll compare ts performance not only wth that of the orgnal queres, but also wth that of local context analyss (LCA) whch extracts the expanded terms from the related documents. The results are shown n Fg Baselne QE on LCA QE on log 0.5 Precson Recall Fg 3. Comparson of query expanson For local context analyss, we use 30 expanson terms from 100 top-raned documents for query expanson. The smoothng factor δ n local context analyss s set to 0.1. The experments showed that query expanson technques can greatly mprove the performance of precson rate and recall rate for nformaton retreval, especally for the documents collecton wth a wde range of content. The results also show that the method of query expanson based on query logs gets better performance than other systems. The reason of the poorer performance acheved by QE on LCA s that the ntal search results of are unsatsfactory. Ths stuaton affects the performance of the expanson algorthm, resultng n rrelevant terms be added to the orgnal query and thus faled to acheve the better results. In our method, the expanson algorthm s based on the mnng of a large scale query logs, relevant expanson terms are selected from the past queres wth the analyss of relaton between queres and documents under the language modelng framewor. Our method can avalably reduce the stuaton of expandng rrelevant terms and decrease the bad mpact of unsatsfactory ntal search results. 5. Conclusons In ths artcle, we presented a new method for query expanson based on query logs mnng. Ths method ams frst to calculate the ambguty degree of the query by explotng the user logs. The result can be used to measure the qualty of the query and decde the expanded length of the query. And n the next step, we use the nformaton of clced documents to

11 A New Query Expanson Method Based on Query Logs Mnng 11 create the correlatons of the queres, and the hgh-qualty expanson terms are selected from the past queres wth the analyss of relaton between queres and documents. Ths s an effectve way to avod the problem of query drft by reducng the rrelevant expanson terms. We tested our method on a data set that s extracted from the real Web envronment. A seres of experments conducted on the data set showed that the query expanson method based on query logs mnng can acheve substantal mprovements n performance. It also outperforms local context analyss, whch s one of the most effectve query expanson methods n the past. Our experments also show that query expanson s more effectve for ambguous queres than for clear queres. Ths also proved that queres should not be dealt wth n the same way and measurement of query qualty s essental to udge f a query need to be expanded, because some expanson terms can degrade the performance of hghqualty queres. 6. References Wen, J. R., Ne, J. Y., and Zhang, H. J., 2001, Clusterng user queres of a search engne. Proceedngs of the 10th Internatonal World Wde Web Conference, pp Kraft, R., and Zen, J., 2004, Mnng anchor text for query refnement. Proceedngs of the. 13th nternatonal conference on World Wde Web, pp Carmel, D., Farch, E., Petruscha, Y., and Soffer, A., 2002, Automatc query refnement usng lexcal affntes wth maxmal nformaton gan. Proceedngs of the 25th Internatonal ACM SIGIR Conference on research and development n nformaton retreval, pp Dou, Z. C., Song, R. H., and Wen, J. R., 2007, A large-scale evaluaton and analyss of personalzed search strateges. Proceedngs of the 16th Internatonal World Wde Web Conference, pp Roccho, J., 1971, Relevance feedbac n nformaton retreval. In The SMART Retreval System: Experments n Automatc Document Processng, pp Salton, G., and Bucley, C., 1990, Improvng retreval performance by relevance feedbac. Journal of the Amercan Socety for Informaton Scence, 41(4), pp Oabe, M., Umemura, K., and Yamada, S., 2005, Query expanson wth the mnmum user feedbac by transductve learnng. Proceedngs Human Language Technology Conference. Emprcal Methods n Natural Language Processng, pp Kelly, D., and Teevan, J., 2003, Implct feedbac for nferrng user preference: A Bblography. ACM SIGIR Forum, 37(2), pp Attar, L., and Fraenel, A.S., 1977, Local feedbac n full-text retreval systems. Journal of the Assocaton for Computng Machnery, 24(3), pp Croft, W.B. and Harper, D.J., 1979, Usng probablstc models of document retreval wthout relevance nformaton. Journal of Documentaton, 35(4), pp

12 12 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao Lam-Adesna, M., and Jones, G. J. F., 2001, Applyng summarzaton technques for term selecton n relevance feedbac. Proceedngs of the 24th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, pp Carpneto, C., De Mor, R., Romano, G., and Bg, B., 2001, An nformaton-theoretc approach to automatc query expanson. ACM Transactons on Informaton Systems, 19(1), pp Morta, M., and Shnoda, Y., 1994, Informaton flterng based on user behavor analyss and best match text retreval. Proceedngs of the 17th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, pp Cu, H., Wen, J. R., Ne, J. Y. and Ma, W. Y., 2003, Query expanson by mnng user logs. IEEE Transactons on Knowledge and Data Engneerng, 15(4), pp Lv, Y. H., Sun, L., Zhang, J. L., Ne, J. Y., Chen, W., and Zhang, W., 2006, An teratve mplct feedbac approach to personalzed search. Proceedngs of the 21st Internatonal Conference on Computatonal Lngustcs and 44th Annual Meetng of the ACL, pp Bar-Yossef, Z. and Gurevch, M., 2008, Mnng search engne query logs va suggeston samplng. Proceedngs of the 34th Internatonal Conference on Very Large Data Bases, pp Wen, J. R., Ne, J. Y., and Zhang, H. J., 2002, Query clusterng usng user logs. ACM Transactons on Informaton Systems, 20(1), pp Bllerbec, B., Scholer, F., Wllams, H. E., and Zobel, J., 2003, Query expanson usng assocated queres. Proceedngs of the 12th nternatonal conference on Informaton and nowledge management, pp Zhang, Z. and Nasraou, O., 2006, Mnng search engne query logs for query recommendaton. Proceedngs of the. 15th nternatonal World Wde Web conference, pp Cover, T. and Thomas, J., 1991, Elements of Informaton Theory. New Yor: John Wley and Sons. Cronen-Townsend S., Zhou Y., Croft W. B. Quantfyng query ambguty. In Proc. of Human Language Technology, 2002, pp:94--98

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan