The Effect of Similarity Measures on The Quality of Query Clusters

The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of Query Clusters Abstract Ln Fu, Don Hoe-Lan Goh, Schubert Shou-Boon Foo, Jn-Cheon Na Dvson of Informaton Studes School of Communcaton and Informaton Nanyang Technologcal Unversty, Sngapore 637718 Query clusterng s a process to group smlar queres automatcally nto dfferent categores. Ths task s mportant to dscover the common nterests of onlne nformaton seekers and to explot the experence of prevous users for the others, whch are harnessed to facltate collaboratve queryng that can help users n dgtal lbrares and other nformaton systems better meet ther nformaton needs. In such cases, the kernel step s to dentfy the smlarty measure between queres. In ths paper, we examne the effectveness of dfferent smlarty dentfcaton methods. A set of experments has been carred out to study the mpact of dfferent smlarty measures on the fnal query clusterng performance. 1. Introducton Wth the ncreasng prolferaton of Internet, people have now come to depend more on the Web or dgtal lbrares (DLs) to search for nformaton. Yet the performance of the exstng search engnes s far from people s satsfacton, exacerbated by the fact that not all results returned by search engnes are relevant nor of acceptable qualty to nformaton seekers. Ths has thus led to a stuaton where users are swamped wth too much nformaton, resultng n dffculty sftng through the materal n search of relevant content. The study of nformaton seekng behavor has revealed that nteracton and collaboraton wth other people s an mportant part n the process of nformaton seekng and use [7][8][17]. Gven ths dea, collaboratve search ams to support collaboraton among people when they search nformaton on the Web or n DLs [5]. Work n collaboratve search falls nto several major categores ncludng collaboratve browsng, collaboratve flterng and collaboratve queryng [14]. In partcular, collaboratve queryng seeks to help users express ther nformaton needs properly n the form of a queston to nformaton professonals, or formulate an accurate query to a search engne by sharng expert knowledge or other users search experences wth each other [14]. Query mnng s one of the common technques used to support collaboratve queryng. It allows users to make use of other users search experences or doman knowledge by analyzng the nformaton stored n query logs (query analyss), groupng (query clusterng) and extractng useful related 1

nformaton on a gven query. The extracted nformaton can then be used as recommendaton tems (used n query recommendng systems) or sources for automatc query expanson. An example s gven below. Consder a user A that s nterested n the XML parser for Java programmng, and she wants to look for artcles and useful web resources relevant to ths feld. Due to her lmted doman knowledge, she enters XML parser as the query to her preferred search engne and gets lsts of results. However nothng n the top 50 results contans the desred nformaton and she does not know how to modfy her query. At the same tme, another user B may know that good search results can be obtaned by usng JDOM as the query. Note that B s search hstory s usually stored n the query logs. Dfferent search engnes have query logs n dfferent formats although most contan smlar nformaton such as a sesson ID, address of user, submtted query, etc. Thus, by mnng the query logs, clusterng smlar queres and then recommendng them to users, there s an opportunty for the frst user to take advantage of prevous queres that someone else had entered and use the approprate ones to meet her nformaton need. From ths example, we can see that the query clusterng s one of crucal steps n query mnng and the challenge here s to dentfy the smlartes between dfferent queres stored n the query logs. The classcal method n nformaton retreval area suggests a smlarty calculaton between queres accordng to query terms (contentbased approach) [13]. The queres wll be grouped nto one cluster f they contan one or more common terms. An alternatve approach s to use the results (e.g. result URLs n Web search engnes) to queres as the crtera to dentfy smlar queres (resultsbased approach) [5][10]. In such case, the query clusters are constructed by calculatng the overlap between the result URLs n response to dfferent queres. Although much work has been done n query clusterng research, there s lttle rgorous analyss of performances based on dfferent query smlarty calculaton approaches. Therefore, the effect of dfferent query smlarty dentfcaton approaches on the qualty of query clusters has not been studed to date. In ths paper, a comprehensve evaluaton on dfferent query smlarty calculaton methods s reported. Ths work wll beneft nformaton retreval systems and DLs n better meetng the nformaton needs of users through collaboratve queryng. Specfcally, ths work reveals the drawbacks and advantages of dfferent query smlarty calculaton approaches and shed lght on mprovng the performance of the algorthms adopted by query recommendng systems to dentfy hgh-qualty query clusters gven a submtted query. The remander of ths paper s organzed as follows. In Secton 2, we revew the lterature related to ths work. Next, we descrbe the query smlarty dentfcaton approach adopted n the research and the algorthm to cluster queres. Then, we descrbe the desgn of evaluaton experments. Further, we report expermental results that assesses the effectveness of dfferent approaches. Fnally, we dscuss the mplcatons of our fndngs for collaboratve queryng systems and outlne areas for further mprovement. 2

2. Related Work There are several useful strands of lterature that bear some relevance to ths work. Ths secton revews lterature from these felds. Frstly, a survey of nformaton seekng behavor s provded as the background for ths research. Next, varous approaches to support collaboratve search are descrbed to address the requrement and mportance of ths work. Fnally, a revew of dfferent query clusterng approaches, the focus of ths work, s presented. 2.1. Informaton Seekng Behavor Informaton seekng s a broad term encompassng the ways ndvduals artculate ther nformaton needs, seek, evaluate, select and use nformaton (Lokman & Stephane, 2001). In other words, nformaton seekng behavor s a purposve seekng for nformaton as a consequence of a need to satsfy some goal. In the course of seekng, the ndvdual may nteract wth people, manual nformaton systems (such as newspapers or lbrares), or wth computer-based nformaton systems such as the World Wde Web (Wlson, 2000). Many researchers have worked n ths area durng the past several decades. Despte the dfferences between varous models, they share a smlarty nteracton and collaboraton wth others s a key component n the process of nformaton seekng and use. For example, Taylor (1968) developed a model of nformaton seekng n lbrares begnnng from how people artculate a queston to a lbraran and the ensung negotaton process wth the lbraran n order to fnd the needed nformaton (queston-negotaton). Taylor s research demonstrates that nteracton and collaboraton wth lbrarans and colleagues s a very mportant step durng the nformaton seekng process. Stated dfferently, how one harnesses other people s knowledge s an essental factor that wll determne the outcome of the nformaton seekng process. Smlarly, Dervn and Dewdney s (1986) Sense Makng Model renforces Taylor s work and focuses on how ndvduals use the observatons of others to construct pctures of realty and use these pctures to gude ther search behavor. The term sense-makng s a label for a coherent set of concepts and methods to descrbe how people construct sense of ther world. Thus sense-makng behavor s communcatng behavor, and nformaton seekng and use s central to sense makng. People communcate and collaborate wth others wthn a certan context n order to meet ther own nformaton needs and then make use of the retreved nformaton for dfferent purposes. Further, Else s (1993) research resulted n a pattern of nformaton-seekng behavor that ncluded eght generc features or research actvtes: startng, channg, browsng, dfferentatng, montorng, extractng, verfyng and endng. Typcally, the startng stage ncludes actvtes characterstc of the ntal search for nformaton, for example, dentfyng references. Ths stage s often accomplshed by askng colleagues or consultng lterature revews, ndexes and abstracts. Else argues that 3

communcaton wth other people s a key component n the ntal search for nformaton. 2.2 Collaboratve Search As descrbed prevously, collaboratve search s an emergng research area whch seeks to support cooperaton among people when they search nformaton on lne. It can be dvded nto three types accordng to the ways that users search for nformaton: collaboratve browsng, collaboratve queryng and collaboratve flterng [14]. Collaboratve browsng can be seen as an extenson of Web browsng. Tradtonal Web browsng s characterzed by dstrbuted, solated users wth low nteractons between them whle collaboratve browsng s performed by groups of users who have a mutual conscousness of the group presence and nteract wth each other durng the browsng process [6]. In other words, collaboratve browsng ams to offer document access to a group of users where they can communcate through synchronous communcaton tools [12]. Examples of collaboratve browsng applcatons nclude Let s Browse [6], a system for co-located collaboratve browsng usng user nterests, and WebEx [19], a meetng system that allows dstrbuted users to browse a Web pages. Collaboratve flterng s a technque for recommendng tems to a user based on smlartes between the past behavor of the user and that of lkemnded people [1]. It assumes that human preferences are correlated and thus f a group of lkemnded users prefer an tem, then the present user may also prefer t. Collaboratve flterng s a benefcal tool n that t harnesses the communty for knowledge sharng and s able to select hgh qualty and relevant tems from a large nformaton stream [4]. Examples of collaboratve flterng applcatons nclude Tapestry [4], a system that can flter nformaton accordng to other users annotatons; GroupLens [12], a recommender system usng user ratngs of documents read; and PHOAKS [18], a system that recommends tems by usng newsgroup messages. Collaboratve queryng on the other hand, asssts users n formulatng queres to meet ther nformaton needs by utlzng other people s expert knowledge or search experence. There are generally two approaches used. Onlne lve reference servces are one such approach, and t refers to a network of expertse, ntermedaton and resources placed at the dsposal of someone seekng answers n an onlne envronment [9]. An example s the Interactve Reference Servce at the Unversty of Calforna at Irvne, whch offers a vdeo reference servce that lnks lbrarans at the reference desk at the Unversty s Scence Lbrary and students workng one-half mle away n a College of Medcne computer lab [16]. Although onlne lve reference servces attempt to buld a vrtual envronment to facltate communcaton and collaboraton, the typcal usage scenaro nvolves many users dependng only on several smart lbrarans. Ths approach nherently has the lmtaton of overloadng especally f too many users ask questons at the same tme. In such cases, users may experence poor servce such as long watng tmes or answers that are nadequate. Further, phone, e-mal and chat, whch are the common technques, adopted by onlne lve reference servces, usually lmt the lbraran and patron to one-on-one communcaton, makng the sharng of reference ntervews more dffcult [21]. 4

An alternatve approach s to mne the query logs of search engnes and use these queres as resources for meetng a user s nformaton needs. Hstorcal query logs provde a wealth of nformaton about past search experences. Ths method thus tres to detect a user s nterests through hs/her submtted queres and locate smlar queres (the query clusters) based on the smlartes of the queres n the query logs [5]. The system can then ether recommend the smlar queres to users (query recommendng systems) [5] or use them as expanson term canddates to the orgnal query to augment the qualty of the search results (automatc query expanson systems) [10][24]. Such an approach overcomes the lmtaton of human nvolvement and network overloadng nherent n onlne lve reference servce. Further, the requred steps can be performed automatcally. Here, calculatng the smlarty between dfferent queres and clusterng them automatcally are crucal steps. A clusterng algorthm could provde a lst of suggestons by offerng, n response to a query q, the other members of the cluster contanng q. There are some commercal search engnes (e.g. Lycos) that gve users the opportunty to rephrase ther queres by suggestng alternate queres. 2.3. Query Clusterng 2.3.1 Content-based approaches Tradtonal nformaton retreval research suggests an approach to query clusterng by comparng query term vectors (content-based approach). In other words, common terms can be used to characterze the cluster of queres. Ths can be done by smply calculatng the overlap of dentcal terms between queres. Further, varous smlarty functons ncorporatng the consderaton of term weghts are avalable ncludng cosne-smlarty, Jaccard-smlarty, and Dce-smlarty [13]. Usng these functons have provded good results n document clusterng due to the large number of terms contaned n documents. Such knd of method s smple and straghtforward for query clusterng. However, the content-based method mght not be approprate for query clusterng snce most queres submtted to search engnes are qute short [20]. A recent study on a bllon-entry set of queres to AltaVsta has shown that more than 85% queres contan less than three terms and the average length of queres s 2.35 [15]. Thus query terms can nether convey much nformaton nor help to detect the semantcs behnd them snce the same term mght represent dfferent semantc meanngs, whle on the other hand, dfferent terms mght refer to the same semantc meanng [10]. 2.3.2 Feedback-based approaches Another approach to clusterng queres s to utlze a user s selectons on the search result lstngs as the smlarty measure [20]. Ths method analyzes the query sesson logs whch contan the query terms and the correspondng documents users clcked on. It assumes that two queres are smlar f they lead to the selecton of a smlar document. Users feedback s employed as the contextual nformaton to queres and has been demonstrated to be qute useful n clusterng queres. However the drawback s that t may be unrelable f users select too many rrelevant documents [20]. Further, the performance of such methods wll be affected greatly by the lack of common documents clcked by users [22]. In other words, f users clck dfferent documents for 5

the dentcal or smlar queres, such methods wll not generate effectve query clusters. 2.3.3 Results-based approaches Raghavan and Sever [10] determne smlarty between queres by calculatng the overlap n documents returned by the queres. Ths s done by convertng result documents nto term frequency vectors. Then the smlarty between two queres was decded by comparng the query result vectors rather than treatng the queres as termvectors. Ftzpatrck and Dent [3] further develop ths method by weghtng the query results accordng to ther poston n the result lst. They argue that the begnnng of a result lst s more lkely to nclude a relevant document to the orgnal query. The weghts used n ther experment are emprcally derved probabltes of dfferent result lst ranges to contan relevant documents. Usng the correspondng query results to cluster queres s useful n boostng the performance of query clusterng n terms of precson and recall [3][10]. However ths method s tme consumng to perform and s not sutable for onlne search systems [3]. Glance [5] thus uses the overlap of result URLs as the smlarty measure nstead of the document content. Queres were posted to a reference search engne and the smlarty between two queres s measured usng the number of common URLs n the top 50 result lst returned from the reference search engne. 3. Query Smlarty Calculatons Ths secton provdes defntons of dfferent query smlarty dentfcaton approaches used n our evaluaton experments. Further, the defnton of how we construct query clusters based on dfferent query smlarty measures s presented. 3.1 Content-based Smlarty Approach We borrow concepts from nformaton retreval [13] and defne a set of queres as D={Q 1, Q 2 Q, Q j. Q n }. A sngle query Q j s converted to a term and weght vector shown n (1), where q s an ndex term of Q j and w Qj represents the weght of the th term n query Q j. In order to compute the term weght, we defne the term frequency, tf Qj, as the number of occurrences of term n query Q j and the query frequency, qf, as the number of queres n a collecton of n queres that contans the term. Hgh term frequency ndcates that a term s hghly related to a query (Stated alternatvely, they are mportant to express the nformaton needs of a query and valuable to cluster queres). Hgh query frequency, on the other hand, ndcates that a term s too general to be useful as descrptor (In other words, they wll not convey useful nformaton for query clusterng). Next, the nverse query frequency, qf, s expressed as (3), n whch n represents the total number of queres n the query collecton. We then compute w Qj based on (2): Qj = { < q, w1 Qj >< ; q, w2 Qj > ;... < q, wqj } (1) w 1 2 > = tf qf (2) Qj Qj * 6

n qf = log( ) (3) qf Gven D, we defne C j as (4) whch represents the common term vector of two queres Q and Q j. Here, q refers to the terms that belong to both Q and Q j. C q q j = { : Q Q ) (4) j Gven these concepts, we now can provde one defnton of query smlarty: Defnton I: A query Q s smlar to query Q j f C j >0, where the C j s the number of common terms n both queres. A basc smlarty measure based on query terms can be computed as follows: Sm _ basc( Q, Q j ) Cj = (5) Max( Q, Q ) where N(Q ) s the number of the keywords n a query Q. j Takng the term weghts nto consderaton, we can use any one of the standard smlarty measures [13]. Here, we only present the cosne-smlarty measure snce t s most frequently used n nformaton retreval: k k cwq cwqj = 1 Sm _ cosne( Q, Q ) = j (6) k 2 2 cw * cw = 1 Q where cw Q refers to the weght of th common term of C j n query Q. = 1 As dscussed, the content-based approach s the smplest method to construct query clusters and the costs of usng such an approach s relatvely low. However ts effectveness s questonable due to the short lengths of most queres. For example the term lght can be used n four dfferent ways (noun, verb, adjectve and adverb). In such cases, content-based query clusterng cannot dstngush the semantc dfferences behnd the terms due to the lack of contextual nformaton and thus cannot provde reasonable cluster results. Thus an alternatve approach based on query results s consdered. 3.2 Result URLs-based Smlarty Approach The results returned by search engnes usually contan a varety of nformaton such as the ttle, the abstract, the category, etc. Ths nformaton can be used to compare the smlarty between queres. In our work, takng the cost of performng tme nto Q j 7

consderaton, we consder the query results unque dentfers (e.g. URLs) n determnng the smlarty between queres [5][23]. 8

Let U(Q j ) be represented as set of query result URLs to query Q j : U ( Q j ) = { u, u 2,.... u } (7) where u represents the th result URL for query Q j. We then defne R j as (8), whch represents the common query results URL vector between Q and Q j. Here u refers to the URLs that belong to both U(Q ) and U(Q j ). R j = { u : u U( Q ) U( Q )} (8) j Next, the smlarty defnton based on query result URLs can be stated as: Defnton II: A query Q s smlar to query Q j f R j >0, where the R j s the number of common result URLs n both queres. The smlarty measure can then be expressed as (9) Sm_ result( Q Q, ) j Rj = (9) Max( U( Q ), U ( Q ) ) j where the U(Q ) s the number of result URLs n U(Q ). Note that ths s only one possble formula of calculatng smlarty usng result URLs. Other measures for determnng the smlarty can be used. For example, overlaps of result ttles or overlaps of the doman names n the result URLs. 3.3 Determnng Query Clusters Gven a set of queres D={Q 1, Q 2.. Qn} and a smlarty measure between queres, we next construct query clusters. Two queres are n one cluster whenever ther smlarty s above a certan threshold. We construct a query cluster G for each query n the query set usng the defnton n (11). Here Sm(Q, Q j ) refers to the smlarty between Q and Q j whch can be computed by usng varous smlarty functons dscussed prevously. G( Q ) = { Q : Sm( Q, Q ) threshold} (10) j where 1 < j < n; n s the total query number. Note there are alternatve query clusterng approaches besdes the one used n our experments, for example, Herarchcal Agglomeratve Clusterng (HAC) algorthms [25]. Comparng wth other approaches, our method s relatvely less tme consumng; thus, the query clusters can be easly constructed. 4. Query Clusterng Experments In our experments, we want to examne the followng questons. To what extent the term weghts boost the performance of clusterng algorthms? In spte of the success of the use of term weghts n document clusterng, the value of term weghts reman uncertan n query clusterng due 9

to the short length of queres. However, to date, there are few studes focusng on ths queston. Hence, n our experments, we compare the performance of basc smlarty measure and the cosne smlarty measure snce they are representatve approaches n lterature [20][13]. Are there dfferences n cluster qualty between the content-based approach and results-based approach? Prevous studes have focused on the comparson between feedback-based approach and content-based approach [20]. Yet accordng to the lterature presented n prevous secton, t s obvous that results-based approach plays an mportant role n query clusterng. To the best of our knowledge, there s lttle work on comparng as well as quantfyng the dfferences between the content-based approach and results-based approach. It s wll be nterestng to conduct such an experment to reveal the strength and weakness behnd these two approaches. 4.1 Data Set & Data Preprocessng We collected sx-month user logs (around two mllon query sessons) from the Dgtal Lbrary of Nanyang Technologcal Unversty (Sngapore). The query logs are n text format and contan nformaton such as: the tme when the user ssue the query, the query terms submtted to the search engne and the number of returned of results by the search engne n response to the query terms. We preprocessed the raw query logs accordng to the followng steps: In order to reduce the sze of the raw data, all relevant data was extracted. Ths ncludes the query terms and the correspondng records number. Due to the large amount of queres contaned n the query log, samplng was carred out. Prevous studes ndcate that the query sample szes wll mpact the fnal experment results [5]. Thus 35000 queres from the query log were selected for our evaluaton snce prevous studes sample szes vary from several hundred [3][10] to tens of thousand queres [5][20]. Further, all dentcal queres were removed so that the queres were dstnct from each other. Therefore, the sze of queres was decreased to 17000. Note that there are more than 50% queres n the orgnal query set, whch have been repeated over tme. Ths phenomenon ndcates, to a certan extent, user s nterests tend to be overlapped and renforce the usefulness of utlzng prevous ssued queres to facltate a successful nformaton seekng. Snce the search engne offers advanced search optons by whch users can choose a specfc doman to search for nformaton, some of the queres have a prefx, whch ndcates the specfc doman to search. Such knd of optons s embedded n the query terms. For example, t ndcates that ths search s wthn the ttle feld. Thus, these prefxes were removed from queres and only the real query terms were remaned. The queres that contan msspellng terms were removed snce they do not make any sense and no documents were retreved. After ths step, there were around 16000 dstnct queres left for our experment. Stop words were removed from the queres n order to get better clusterng results when usng content-based smlarty measures snce these terms (such as the, a, an ) do not convey useful meanngs. 10

Wthn the 16000 query samples, 23% of the queres contaned one keyword, 36% of the queres contaned two keywords, and 18% of the queres contaned three keywords. Further, Approxmately, 77% of the queres contaned no more than three keywords. The average length of all the query samples was 2.73. Ths observaton s smlar to prevous studes [15]. The 16000 query samples contaned 37752 ndvdual terms. It s nterestng to observe that there were 9503 dstnct terms wthn the 16000 query samples. Therefore, each dstnct term appears 3.97 tmes on average. Ths observaton shows that people tend to use smlar keywords to express ther nformaton needs. Table 1. shows some examples n the fnal query sample. 4.2. Methodology Table 1. Examples of Queres cards game fabrcaton of CMOS communcatons handbook between people chemcal engneerng desalnaton plant ntellgence and costs devce materal characterzaton and moble phone works NT matrx compostes packagng gene machnery Julus Lester process of water treatment We calculated the smlarty between queres usng the followng smlarty measures: Basc smlarty (sm_basc) -- functon (5) Content-based smlarty (sm_cosne) -- functon (6) Results-based smlarty (sm_result) -- functon (9) Frst, all queres were splt nto separate terms. For sm_basc, each query length was computed. Next, the number of common terms between two queres was computed by calculatng the ntersecton of two queres. Fnally, the sm_basc functon was calculated. For sm_cosne, the weght of all terms wthn a sngle query was computed usng functon (2). By usng the ntersecton of two queres generated n the prevous step as well as the weght of each term, sm_cosne was computed by usng functon (6). For sm_result, we posted each query to a reference search engne (Google) and retreved the correspondng result URLs. By desgn, search engnes rank hghly relevant results hgher, and therefore, we only consdered the top 10 result URLs returned to each query. Ths method s smlar to those used n [5][23]. The result URLs were then be used to compute the smlarty between queres accordng to functon (9). Recall that two queres are n one cluster whenever ther smlarty s above a certan threshold. Threshold s the baselne to determne whether two queres should be clustered nto to the same group. Therefore, dfferent thresholds wll lead to dfferent query clusters. In all approaches, smlarty thresholds (10) were set to 0.25, 0.5, 0.7 and 0.9 respectvely n order to study the mpact of varous thresholds on the fnal performance of the clusterng algorthms. 11

4.3. Performance Measures In our experments, the qualty of query clusters usng dfferent smlarty calculaton approaches was examned (please refer to Introducton Secton). After obtanng the clusters based on the dfferent smlarty measures, we frst observed the average cluster sze and the range of the cluster szes. Ths nformaton sheds lght on the ablty of the dfferent measures to provde recommended queres on a gven query. In other words, they can reflect the varety of the recommended queres to a user. Next, coverage, precson and recall were calculated. Coverage s the ablty of the dfferent smlarty measures to fnd smlar queres for a gven query. It s the percentage of queres for whch the smlarty functon s able to provde a cluster. Ths value wll ndcate the probablty that the user can obtan recommended queres for hs/her ssued query. Precson and recall were used to assess the accuracy of the query clusters generated by dfferent smlarty functons. Frst, precson referred to the rato of the number of smlar queres to the total number of queres n a cluster. For precson, we randomly selected 100 clusters and checked each query n the cluster manually [20]. Snce the actual nformaton needs represented by the queres are not known, the smlarty between queres wthn a cluster was judged by a human evaluator by takng nto account the query terms as well as result URLs. The average precson was then computed for the 100 selected clusters. Recall refers to the rato of the number of smlar queres to the total number of all smlar queres across the query set (those n the current cluster and others). However t posed a problem as t was dffcult to calculate drectly because no standard clusters were avalable n the query set. Therefore, an alternatve measure to reflect recall was used. Recall was defned to be the rato of the number of correctly clustered queres wthn the 100 selected clusters to the maxmum number of the correctly clustered queres across the test collecton [20]. The number of correctly clustered queres wthn the 100 selected clusters equals to the query numbers of 100 selected query clusters tmes average precson. The query numbers of 100 selected query clusters can be computed by average cluster sze tmes 100. In our work, the maxmum number of the correctly clustered queres was 1948, whch was obtaned by sm_basc wth the threshold of 0.25. Analyss of varance procedures (ANOVA) were also conducted to reveal whether thresholds and dfferent smlarty calculaton approaches affected the query clusters n terms of average cluster sze, precson and recall. Snce the values for coverage are categorcal, Ch-Square was used to measure the effect of thresholds and dfferent smlarty calculaton approaches on coverage. 12

5. Expermental Fndngs 5.1 Results By varyng the smlarty thresholds we obtaned dfferent average cluster szes (Fgure1). Along wth the change of threshold from 0.25 to 0.9, the average cluster sze of sm_basc decreases from 50.27 to 2.11, sm_cosne decreases from 43.65 to 8.06 and sm_result decreases from 2.63 to 2.21.It can be seen from the results that when usng sm_basc and sm_cosne to cluster queres, the average cluster sze s bgger than usng sm_result. Ths ndcates that for a query cluster, the content-based approach (both sm_basc and sm_cosne) can fnd a larger number of queres for a gven query than the other approaches. Stated dfferently, the content-based approach can provde a greater varety of queres to a user gven hs/her submtted query. It s nterestng to observe that sm_basc outperforms sm_cosne when the threshold s less than 0.6 whle sm_cosne performs better when the threshold s bgger than 0.6. The reason behnd ths phenomenon may be that the more common terms between two queres, the more mportant role the weght of terms plays n fndng related queres. A 4 X 3 (4 Thresholds X 3 Smlarty approaches) ANOVA yelded a statstcally sgnfcant nteracton effect on average cluster sze, F (6,11) = 306.98, p <.001. Ths ndcates that the varance of the average cluster szes s sgnfcant across the cells defned by the combnaton of factor levels: thresholds and smlarty approaches. There also exsted sgnfcant effects for thresholds, F (3,11) = 4205.66, p <.001, and for smlarty approaches F (2,11) = 1353.50, p <.001. average cluster sze 60 50 40 30 20 10 0 0.25 0.5 0.7 0.9 threshold basc cosne result Fgure 1. Average cluster szes Further, for coverage, sm_basc decreases from 80.45% to 3.71%, sm_cosne decreases from 82.74% to 18.02% and sm_result decreases from 22.03% to 6.99%, wth the change of threshold from 0.25 to 0.9 (see Fgure 2). The results show that 13

sm_cosne and sm_basc ranks hgher n coverage, demonstratng that the contentbased approach has a better ablty to fnd smlar queres from a gven query than results-based approach. In other words, users have a hgher lkelhood to obtan a recommendaton to a gven query than usng results-based approach. The fact, as dscussed prevously, that the users tend to use smlar terms to express ther nformaton need mght account for the hgh performance of content-based approach n term of coverage. On the other hand, the number of dstnct URLs s often huge. Ths mght explan the low performance of sm_result n terms of coverage snce many smlar queres cannot be grouped together due to a lack of common result URLs [23]. Further, sm_cosne performs better than sm_basc through all the thresholds whch ndcates that the weght of terms can mprove the ablty to fnd smlar queres from a gven query n spte of the short length of queres. The Ch-Square test ndcates that for each ndvdual threshold, the dfferences across varous approaches are sgnfcant. For threshold of 0.25, X 2 (2, N=48000) = 15886.61, p <.001, for threshold of 0.5, X 2 (2, N=48000) = 27056.64, p <.001, for threshold of 0.7, X 2 (2, N=48000) = 12124.40, p <.001, for threshold of 0.9, X 2 (2, N=48000) = 2069.60, p <.001, Ths means the thresholds and dfferent smlarty dentfcaton approaches wll affect the coverage sgnfcantly. 100.00% 80.00% coverage 60.00% 40.00% 20.00% 0.00% 0.25 0.5 0.7 0.9 threshhold basc cosne result Fgure 2. Coverage Fgure 3 ndcates that the results-based approach s better able to cluster smlar queres correctly than the other approaches. In terms of precson, sm-result (ncreases from 93.33% to 100%, along wth the change of smlarty threshold from 0.25 to 0.9) performs best, ndcatng that almost all of the queres n the cluster were consdered smlar. When the threshold equals 0.9, the precson of sm-result reaches the peak, 100%, whch ndcates that there are no rrelevant queres n the clusters. Ths tme, the content-based method suffers from poorer performance n terms of precson. The precsons of sm_basc (ncreases from 38.74% to 99.98%) and sm_cosne (ncreases from 35.46% to 96.56%) generate almost same results, both of whch are below that of sm_result. The precson of content-based approach s lower because of the short length of queres and the lack of the contextual nformaton n whch queres are used. On other hand, Google tends to return the same URLs to 14

semantcally related queres [5][23], whch mght account for the good performance of results-based method n terms of precson. A 4 X 3 (4 Thresholds X 3 Smlarty approaches) ANOVA yelded a statstcally sgnfcant nteracton effect on precson, F (6,11) = 42.41, p <.001, ndcatng that the varance of the precson s sgnfcant across the cells defned by the combnaton of factor levels: thresholds and smlarty approaches. There also exsted man effects for thresholds, F (3,11) = 211.45, p <.001, and for smlarty approaches F (2,11)=192.52, p <.001. Ths means each of the two factors can affect the precson sgnfcantly. precson 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.25 0.5 0.7 0.9 threshold basc cosne result Fgure 3. Precson For recall, sm_basc has the best performance at 100% when the threshold equals 0.25, ndcatng that all smlar queres were contaned n query clusters. It s nterestng to observe that sm_basc outperforms sm_cosne when the threshold s less than 0.6 whle sm_cosne performs better when the threshold s larger than 0.6. The reason behnd ths phenomenon s that snce the recall calculaton ncludes average cluster sze (refer to the defnton of recall n Secton 4.3), therefore, recall s changed n accordance wth average cluster sze (see Fgure 1). Further both sm_basc and sm_cosne outperform sm_result n terms of recall. The low average cluster sze of sm_result mght account for ths. Note that although the recall used n ths experment s not the same wth the tradtonal defnton used n nformaton retreval research, t does provde useful nformaton to ndcate the accuracy of clusters generated by the dfferent smlarty functons [20]. That s, the modfed recall measure reflects the ablty to uncover clusters of smlar queres generated by dfferent smlarty functons on the sample set queres used n the experments. A 4 X 3 (4 Thresholds X 3 Smlarty approaches) ANOVA yelded a statstcally sgnfcant nteracton effect on precson, F (6,11) = 206.89, p <.001, ndcatng that the varance of the recall s sgnfcant across the cells defned by the combnaton of factor levels: thresholds and smlarty approaches. There also exsted man effects for thresholds, F (3,11) = 1057.12, p <.001, and for smlarty approaches F (2,11) = 423.14, p <.001. Ths means each of the two factors can affect the recall sgnfcantly. 15

recall 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.25 0.5 0.7 0.9 threshold basc cosne result Fgure 4. Recall 5.2 Dscusson In summary, our experments show that t s dffcult to fnd a best approach by usng ndvdual smlarty approach alone snce for each metrc n our experments, we get dfferent approaches whch outperform others. Table 2 summarzes the comparson of content-based and results-based approaches. Here, the average value across all thresholds n terms of dfferent performance measures was used to generate ths table. The approach, whose average value s larger, wll be regarded as better. For example, users wll have a hgher chance of obtanng a recommendaton usng content-based approach whle the accuracy of the recommended queres wll be poor. The resultsbased approach mproves average precson but suffers from poor coverage and recall. Ths result offers opportuntes to enhance the performance of query clusterng algorthms by usng both query terms and the result URLs snce the strength of ndvdual approaches mght balance the drawbacks of each other. Table 2. Summary of the comparson between content-based and results-based approach Better Worse Coverage Precson Recall Average cluster sze Contentbased approch Results-based approach Contentbased approch Results-based approach Resultsbased approach Content-based approch Contentbased approch Results-based approach Further, our experments show that though the short length of queres mght add doubt on the usefulness of the weght of terms, t does provde contrbutons to boostng the 16

coverage wthout damagng other metrcs. Table 3. summarzes the comparson of dfferent approaches n more detal, takng the mpact of dfferent thresholds nto consderaton. All the thresholds were categorzed nto two groups: low threshold, ncludng 0.25 and 0.5, and hgh threshold, ncludng 0.7 and 0.9. Note that for precson, sm_basc and sm_cosne generate smlar results wth regards to low threshold and hgh threshold respectvely. Ths table was constructed based on the observaton of prevous fgures, whch further ndcates the strength and weakness of dfferent approaches based on dfferent thresholds. Table 3. Summary of comparson across all approaches Average cluster sze Coverage Precson Recall Low threshold (0.25, 0.5) Hgh threshold (0.7, 0.9) B C R B C B C R R B & C R C C R C B R B R B R B & C 6. Conclusons and Future Work B----sm-basc, C----sm-cosne, R----sm-result In ths paper, we compare dfferent query smlarty measures. Our experments show that by usng content-based and results-based approaches alone, each method wll generate drawbacks that wll affect the qualty of query clusters. From the results, t s obvous that the precson of content-based approach s low due to the short length of queres and the lack of the contextual nformaton n whch queres are used whle the results-based approach performs well n terms of precson. On the other hand, the results-based approach suffers from poor performance n terms of coverage whle, ths tme, the content-based approach offers better results. Takng together, ths ndcates that the advantages of ndvdual approaches offer opportuntes to compensate the weakness ponts of each other. Therefore, they can complement each other n order to enhance the overall qualty of fnal query clusters. Further, sm_cosne and sm_basc generate the smlar results n almost all metrcs except coverage, n whch sm_cosne performs much better than sm_basc. Ths suggests that the weghts of term can make contrbuton to the qualty of query clusters. Our work can contrbute to research n collaboratve queryng systems that mne query logs to harness the doman knowledge and search experences of other nformaton seekers found n them. The experment results reported here can be used 17

to develop new systems or further refne exstng systems that determne and cluster smlar queres n query logs, and augment the nformaton seekng process by recommendng related queres to users. As dscussed prevously, such knd of system can help nformaton seekers especally novces to express ther nformaton needs accurately. In addton to the ntal experments performed n ths research, experments nvolvng hybrd approaches, whch explot both query terms as well as result URLs, are also planned. Based on content-based and result URLs-based approaches, the hybrd approach mght generate a balanced result than usng them ndvdually. Further, alternatve approaches to dentfyng the smlarty between queres wll also be attempted. For example, the result URLs can be replaced by the doman names of the URLs to mprove the coverage of the results-based query clusterng approach. In addton, word relatonshps lke hypernyms can be used to replace query terms before computng the smlarty between queres to ncrease the coverage as well as average cluster sze. Fnally, experments usng other clusterng algorthms such as DBSCAN [2] mght also be conducted to assess clusterng qualty. Snce DBSCAN s a denstybased clusterng algorthm, t allows the system to fnd ndrectly related queres besdes the drectly related queres for a gven query. Hence, the average cluster sze and coverage mght be mproved. References [1] Chun, I. G., & Hong, I. S. (2001). The mplementaton of knowledge-based recommender system for electronc commerce usng Java expert system lbrary. Proceedngs of IEEE Internatonal Symposum on Industral Electroncs, 1766-1770. [2] Ester, M., Kregel, H., Sander, J., & Xu, X., (1996) A densty-based algorthm for dscoverng clusters n large spatal databases wth nose. Proceedngs of second Internatonal Conference on Knowledge Dscovery and Data Mnng, 226-231. [3] Ftzpatrck, L., & Dent, M. (1997). Automatc feedback usng past queres: Socal searchng? Proceedngs of SIGIR 97, 306-313. [4] Goldberg, D., Nchols, D., Ok, B. M., & Terry, D. (1992). Usng collaboratve flterng to weave an nformaton tapestry. Communcatons of ACM, 35(12), 61-70. [5] Glance, N. S. (2001). Communty search assstant. Proceedngs of Sxth ACM Internatonal Conference on Intellgent User Interfaces, 91-96. [6] Leberman, H. (1995). An agent for web browsng. Proceedngs of Internatonal Jont conference on Artfcal Intellgence, 924-929. [7] Lokman, I. M., & Stephane, W. H. (2001) Informaton seekng behavor and use of socal scence faculty studyng stateless natons: A case study. Journal of lbrary and Informaton Scence Research, 23(1), 5-25. [8] Marchonn, G. N. (1995). Informaton seekng n electronc envronments. Cambrdge, England: Cambrdge Unversty Press. [9] Pomerantz, J., & Lankes, R.D. (2002). Integratng expertse nto the NSDL: Puttng a human face on the dgtal lbrary. Proceedngs of the Second Jont Conference on Dgtal Lbrares, 405. [10] Raghavan, V. V., & Sever, H. (1995). On the reuse of past optmal queres. Proceedngs of the Eghteenth Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, 344-350. 18

[11] Resnck, P., Iacovou, N., Mtesh, S., Bergstron, P., & Redl, J. (1994). GroupLens: An open archtecture for collaboratve flterng of Netnews. Proceedngs of the 1994 ACM Conference on CSCW, 175-186. [12] Revera, G.D.J.H., Courtat, J., & Vllemur, T. (2001). A desgn framework for collaboratve browsng. Proceedngs of Tenth IEEE Internatonal Workshops on Enablng Technologes: Infrastructure for Collaboratve Enterprses, 362-374. [13] Salton, G., & Mcgll, M.J. (1983). Introducton to Modern Informaton retreval. McGraw-Hll New York, NY. [14] Churchll, E.F., Sullvan, J. W., & Snowdon, D. (1999) Collaboratve and cooperatve nformaton seekng. Workshop Report n CSCW 98. [15] Slversten, C., Henznger, M., Maras, H., & Morcz, M. (1998) Analyss of a very large Altavsta query log. DEC SRC Techncal Note 1998-14. [16] Sloan, B. (1997, December 16). Servce perspectves for the dgtal lbrary remote reference servces. Avalable at: http://www.ls.uuc.edu/~b-sloan/e-ref.html [17] Taylor, R. (1968). Queston-negotaton and nformaton seekng n lbrares. College and Research Lbrares, 29(3), 178-194. [18] Teveen, L., Hll, W., Amento, B., Davd, M., & Creter, J. (1997). PHOAKS: A system for sharng recommendatons. Communcatons of the ACM, 40(3), 59-62. [19] WebEx home page. http://www.webex.com [20] Wen, J.R., Ne, J.Y., & Zhang, H.J. (2002) Query clusterng usng user logs. ACM Transactons on Informaton Systems, 20(1), 59-81. [21] Anderson, E., Boyer, J., & Cccone, K. (2000) Remote Reference Servces at the North Carolna State Unversty Lbrares. Proceedngs of Second Dgtal Reference Conference. 20-28. [22] Chuang, S.L., & Chen, L.F. (2002) Towards Automatc Generaton of Query Taxonomy: A Herarchcal Query Clusterng Approach. Proceedngs of IEEE 2002 Internatonal Conference on Data Mnng. 75-82. [23] Osmar, R.Z., & Alexaander, S. (2002) Fndng Smlar Queres to Satsfy Searches Based on Query Traces. Wordshops of OOIS 2002. 207-216. [24] Crouch, C.J., Crouch, D.B. & Kareddy, K.R. The Automatc Generaton of Extended Queres. Proceedngs of 13 th Annual Internatonal ACM SIGIR Conference. 269-283. [25] Jan, A.K., Murty, M.N. & Flynn, P.J. (1999) Data Clusterng: A Revew. ACM Computng Surveys, 31(3), 264-323. 19