Personalized Concept-Based Clustering of Search Engine Queries

Size: px
Start display at page:

Download "Personalized Concept-Based Clustering of Search Engine Queries"

Transcription

1 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton on the Web has ntroduced new challenges for buldng effectve search engnes. A maor problem of web search s that search queres are usually short and ambguous, and thus are nsuffcent for specfyng the precse user needs. To allevate ths problem, some search engnes suggest terms that are semantcally related to the submtted queres so that users can choose from the suggestons the ones that reflect ther nformaton needs. In ths paper, we ntroduce an effectve approach that captures the user s conceptual preferences n order to provde personalzed query suggestons. We acheve ths goal wth two new strateges. Frst, we develop onlne technques that extract concepts from the web-snppets of the search result returned from a query and use the concepts to dentfy related queres for that query. Second, we propose a new two-phase personalzed agglomeratve clusterng algorthm that s able to generate personalzed query clusters. To the best of the authors knowledge, no prevous work has addressed personalzaton for query suggestons. To evaluate the effectveness of our technque, a Google mddleware was developed for collectng clckthrough data to conduct expermental evaluaton. Expermental results show that our approach has better precson and recall than the exstng query clusterng methods. Index Terms Clckthrough, concept-based clusterng, personalzaton, query clusterng, search engne. 1 INTRODUCTION he amount of nformaton avalable on the web s Tgrowng rapdly. Google [4] reported that ts ndex sze was over 8 bllon pages n 2004, and t was estmated that t had 20 bllon pages n As the web keeps expandng, the number of pages ndexed n a search engne ncreases correspondngly. Wth such a large volume of data, fndng relevant nformaton satsfyng user needs based on smple search queres becomes an ncreasngly dffcult task. Queres submtted by search engne users tend to be short and ambguous. A study by M. Jansen [20] found that the average query length on a popular search engne was only 2.35 terms. These short queres are not lkely to be able to precsely express what the user really needs. As a result, lots of pages retreved may be rrelevant to the user needs because of the ambguous queres. On the other hand, users may not want to reformulate ther queres usng more search terms, snce t mposes addtonal burden on them durng searchng. To mprove user s search experence, most maor commercal search engnes provde query suggestons to help users formulate more effectve queres. When a user submts a query, a lst of terms that are semantcally related to the submtted query s provded to help the user to dentfy terms that he/she really wants, hence mprovng the retreval effectveness. Yahoo's Also Try [6] and Google's Searches related to features provde related K.W. Leung, W. Ng, and D.L. Lee are wth the Department of Computer Scence and Engneerng, Hong Kong Unversty of Scence and Technology, Clear Water Bay, Hong Kong. E-mal: {kwtleung, wlfred, dlee}@cse.ust.hk. queres for narrowng search, whle Ask Jeeves [2] suggests both more specfc and more general queres to the user as shown n Fg. 2. Unfortunately, these systems provde the same suggestons to the same query wthout consderng users specfc nterests. In ths paper, we propose a method that provdes personalzed query suggestons based on a personalzed concept-based clusterng technque. In contrast to exstng methods whch provde the same suggestons to all users, our approach uses clckthrough data to estmate user s conceptual preferences and then provdes personalzed query suggestons for each ndvdual user accordng to hs/her conceptual needs. The motvaton of our research s that queres submtted to a search engne may have multple meanngs. For example, dependng on the user, the query apple may refer to a frut, the company Apple Computer or the name of a person, etc. Thus, provdng personalzed query suggeston (e.g. users nterested n apple as a frut get suggestons about frut, whle users nterested n apple as a company get suggestons about the company's products) certanly helps users to formulate more effectve queres accordng to ther needs. The underlyng dea of our proposed technque s based on concepts and ther relatons extracted from the submtted user queres, the web-snppets 1 and the clckthrough data. Clckthrough data was exploted n the personalzed clusterng process to dentfy user preferences: a user clcks on a search result manly because the websnppet contans a relevant topc whch the user s nterested n. Moreover, clckthrough data can be collected easly wthout mposng extra burden on users, and thus provdng a low-cost means to capture user's nterest. Manuscrpt receved (nsert date of submsson f desred). Please note that all 1 web-snppet denotes the ttle, summary and URL of a Web page acknowledgments should be placed at the end of the paper, before the bblography. re-turned by search engnes. xxxx-xxxx/0x/$xx x IEEE

2 2 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID TABLE 1 THE CLICKTHROUGH DATA FOR THE QUERY APPLE Lnks Clcked Web-Snppets for the Search Results l 1 Apple Hong Kong ( l 2 Apple Hong Kong - Pod + Tunes ( l 3 apple daly ( l 4 Apple ( l 5 Apple - Pod + Tunes ( l 6 Apple Inc. - Wkpeda, the free encyclopeda ( l 7 Apple II seres - Wkpeda, the free encyclopeda ( l 8 Apple.Mac ( l 9 The Apple Store (US) ( Apple - Support ( l 10 Our approach conssts of the followng four maor steps. Frst, when a user submts a query, concepts (.e. mportant terms or phrases n web-snppets) and ther relatons are mned onlne from web-snppets to buld a concept relatonshp graph. Second, clckthroughs are collected to predct user's conceptual preferences. Thrd, the concept relatonshp graph together wth the user's conceptual preferences s used as nput to a conceptbased clusterng algorthm that fnds conceptually close queres. Fnally, the most smlar queres are suggested to the user for search refnement. Fg. 1 shows the general process of our approach. To evaluate the performance of our approach, we developed a Google mddleware for clckthrough data collecton users were nvted to use our mddleware to search 200 test queres selected from a spectrum of topcal categores. When a user submts a query, concepts related to the query are mned and stored n our databases. If the user clcks on one of the search results, the user's clckthroughs together wth hs/her concept preference profle for the query are updated. The clusterng results on the 200 test queres are compared aganst the predefned clusters prepared by human edtors. We evaluate the performance of our approach usng the standard recall-precson measures. Beeferman and Berger's agglomeratve clusterng algorthm [11] (or smply called BB s algorthm n ths paper) s used as the baselne to compare wth our concept-based approach. Our expermental results show that the average precson at any recall level s better than the baselne method. The man contrbutons of ths paper are summarzed below: 1. Most of the prevous approaches on query clusterng consder two dfferent queres to be semantcally smlar f they lead to common clcks on the same pages. However, the chance for dfferent queres leadng tocommon clcks on the same URLs are rare n web search engnes (see Secton 2 for more dscusson) 2 The mddleware approach s for facltatng expermentaton. The technques developed n ths paper can be drectly ntegrated nto any search engne to provde personalzed query suggestons. Fg. 1. The general process of concept-based clusterng. Based on ths mportant observaton, we propose to use concepts, not pages, as the common ground for relatng semantcally smlar queres. That s, two queres are consdered related f they lead to clcks on pages that share some common concepts, whch are mned from the web-snppets n the search results. 2. To our knowledge, there s no prevous study on the personalzaton of query suggestons. We propose a two-phase clusterng method to cluster queres frst wthn the scope of each user and then for the communty. 3. We conduct experments to evaluate dfferent methods and show that our concept-based, two-phase clusterng method yelds the best precson and recall. The rest of ths paper s organzed as follows. In Secton 2, we compare our method wth other smlar approaches. We also dscuss some works related to concept mnng. In Secton 3, we revew BB s algorthm, whch s also an effectve technque n personalzed query clusterng. In Secton 4, our concept mnng method for extractng concepts from web-snppets s presented. In Secton 5, we adapt BB's algorthm to our concept-based approach. We further extend the concept-based BB's algorthm to a personalzed clusterng algorthm by utlzng the user concept preference profles. Expermental results comparng BB's algorthm wth our methods are presented n Secton 6. Secton 7 concludes the paper. 2 RELATED WORK Query clusterng technques have been developed n dversfed ways. The very frst query clusterng technque comes from nformaton retreval studes [26]. Smlarty between queres was measured based on overlappng keywords or phrases n the queres. Each query s represented as a keyword vector. Smlarty functons such as cosne smlarty or Jaccard smlarty [26] were used to measure the dstance between two queres. One maor lmtaton of the approach s that common keywords also exst n unrelated queres. For example, the queres, apple Pod (an mp3 player) and apple pe (a dessert), are

3 AUTHOR ET AL.: TITLE 3 Fg. 2. Above s part of the search result page generated by Ask.com n response to the query apple. A lst of query suggestons s provded showng many possble choces for query refnement. very smlar snce they both contan the keyword apple. However, the queres are actually expressng two dfferent search needs. Chuang and Chen [14] proposed to cluster and organze users' queres nto a herarchcal structure of topc classes. A Herarchcal Agglomeratve Clusterng (HAC) [25] algorthm s frst employed to construct a bnary-tree cluster herarchy. The bnary-tree herarchy s then parttoned n order to create sub-herarches formng a multway-tree cluster herarchy lke the herarchcal organzaton of Yahoo [6] and DMOZ [3]. Baeza-Yates et al. [10] proposed a query clusterng method that groups smlar queres accordng to ther semantcs. The method creates a vector representaton Q for a query q, and the vector Q composes of terms from the clcked documents of q. Cosne smlarty s appled to the query vectors to dscover smlar queres. More recently, Zhang and Nasraou [33] presented a method that dscovers smlar queres by analyzng users' sequental search behavor. The method assumes that consecutve queres submtted by a user are related to each other. The sequental search behavour s combned wth a tradtonal content-based smlarty method to compensate for the hgh sparsty of real query log data. Recently, Betzel et al. [12] proposed a query classfcaton method that combnes multple classfers. The method combnes technques from machne learnng and computatonal lngustcs. Ther results were compared to those from the 2005 KDD Cup [5], showng that ther combned approach produced hgher recall and smoother tradeoffs between recall and precson than any of the component approaches. On web search engnes, clckthrough data s a knd of mplct feedback from users. Table 1 s an example clckthrough data for the query apple, whch shows the URLs returned from the search engne for the query and the URLs clcked on by the user. Clearly, t s a valuable resource for capturng the user's nterest for buldng personalzed web search systems [7], [8], [17], [18], [21], [22], [24], [27], [28], [29]. Joachms [21] proposed a method whch employs preference mnng and machne learnng to rerank search results accordng to user's personal preferences. Later on, Smyth et al. [27] suggested that user search behavour s repettve and regular. They proposed to rerank search results such that the results whch have been prevously selected for a gven query are promoted ahead of other search results. More recently, Deng et al. [17] proposed an algorthm whch combnes a spyng technque together wth a novel votng procedure to determne user preferences from the clckthrough data. Dou et al. [18] also performed a large scale evaluaton on dfferent personalzed search strateges, ncludng clckthrough-based and profle-based personalzaton. They suggested that clck-based personalzaton strateges perform consstently and consderably well when compared to profle-based methods. To resolve the dsadvantage of keyword-based clusterng methods, clckthrough data has been used to cluster queres based on common clcks on URLs. Beeferman and Burger [11] proposed an agglomeratve clusterng algorthm (.e. BB s algorthm) to explot query-document relatonshps from clckthrough data. Gven a search engne log, BB's algorthm frst constructs a bpartte graph wth one set of vertces correspondng to queres, and another correspondng to documents. If a user clcks on a document, a lnk between the correspondng query and document s created on the bpartte graph. After the bpartte graph s obtaned, agglomeratve clusterng algorthm s used to obtan the clusters. The algorthm s contentndependent n the sense that t explots only the querydocument lnks on the bpartte graph to dscover smlar queres and smlar documents wthout examnng the keywords n the queres or the documents. The detals of the algorthm wll be descrbed n Secton 3. Wen et al. [31] proposed a clusterng algorthm combnng both query contents and URL clcks. They suggested that two queres should be clustered together, f they contan the same or smlar terms, and lead to the selecton of the same documents. However, snce web search queres are usually short and common clcks on documents are rare (see dscusson below), Wen et al's method may not be effectve for dsambguatng web queres. In contrast, our approach relates the queres wth a set of extracted concepts n order to dentfy the precse semantcs of the search queres. One maor problem wth clckthrough-based method s that the number of common clcks on URLs for dfferent queres s lmted. Ths s because dfferent queres wll lkely retreve very dfferent result sets n very dfferent rankng orders. Thus, the chance for the users to see the same results would be small, let alone clckng on them. It was reported that n a large clckthrough dataset from a commercal search engne the chance for two random queres to have a common clck s merely 6.38x10-5 [11]. The small number of common clcks leads to low recall. To allevate ths problem, we ntroduce the noton of concept-based graphs by consderng concepts extracted from web-snppets and adapt BB's method to ths new

4 4 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID Symbol G TABLE 2 FREQUENTLY USED SYMBOLS A bpartte graph Descrpton m The number of teratons (.e. merges) requred for agglomeratve clusterng n b The number of black vertces n G n w The number of whte vertces n G N max The maxmum number of neghbors of any vertex n G sm(x, Smlarty between vertces x and y n G sm R (t,t ) Smlarty between concepts t and t sf(t ) Snppet frequency of the keyword/phrase t support(t ) Interestngness of a partcular keyword/phrase t wth respect to the returned web-snppets arsng from a query t The number of terms n the keyword/phrase t upper bound The upper bound for the number of operatons requred for agglomeratve clusterng context. In contrast to the exstng methods, our approach provdes effectve personalzaton effect by usng the concept preference profles that are bult upon the extracted concepts and clckthroughs. The use of concepts helps to reduce the sze of the resulted profles, whle retanng the accuracy and capablty to capture user's nterests. Along the lne of concept extracton from websnpplets, Koester [23] combned web mnng technques and formal concept analyss to extract concepts from websnppets and buld a concept lattce capturng user's conceptual needs. However, t was not concerned wth personalzaton. Xu et al. [32] proposed a method to extract concepts from users browsed documents to create herarchcal concept profles for personalzed search n a prvacy-enhanced envronment. Ther method assumes that the system knows the documents that user s nterested n, nstead of usng clckthrough. Thus, ther method s qute dfferent from ours. Another technque to dscover related queres s query expanson. The am of query expanson s to mprove retreval effectveness by expandng the query wth words or phrases to match addtonal documents. Cu et al. [15] proposed a query expanson method based on user nteractons recorded n the clckthrough data. The method focuses on mnng correlatons between query terms and document terms by analyzng user's clckthroughs. Document terms that are strongly related to the nput query are used together to narrow down the search. 3 BB'S GRAPH-BASED CLUSTERING ALGORITHM In BB s graph-based clusterng [11], a query-page bpartte graph s frstly constructed wth one set of the nodes corresponds to the set of submtted queres, and the other corresponds to the sets of clcked pages. If a user clcks on a page, a lnk between the query and the page s created on the bpartte graph. After obtanng the bpartte graph, an agglomeratve clusterng algorthm s used to dscover smlar queres and smlar pages. Durng the clusterng process, the algorthm teratvely combnes the two most smlar queres nto one query node, then the two most Fg. 3. (a) Queres q 1 and q 3 seem unrelated before document clusterng. (b) After document clusterng, queres q 1 and q 3 are then related to each other because they are both lnked to the document cluster {d 1,d 2 }. Fg. 4. (a) A bpartte graph wthout nose. (b) A bpartte graph wth a nose lnk, where the sold edges represent real lnks and the dash edge represents a nose edge. smlar pages nto one page node, and the process of alternatve combnaton of queres and pages s repeated untl a termnaton condton s satsfed. The man reason for not clusterng all the queres frst and then all the pages next s that two queres may seem unrelated pror to page clusterng because they lnk to two dfferent pages but they may become smlar to each other f the two pages have a hgh enough smlarty to each other and are merged later. The example n Fg. 3 helps llustrate ths scenaro. To compute the smlarty between queres or documents on a bpartte graph, the algorthm consders the overlap of ther neghborng vertces as defned n the followng equaton: N( x) N( N( x) N( sm( x, = 0 f N( x) N( > 0 otherwse where N(x) s the set of neghborng vertces of x, and N( s the set of neghborng vertces of y. Intutvely, the smlarty functon formalzes the dea that x and y are smlar f ther respectve neghborng vertces largely overlap and vce versa. As dscussed n Secton 2, a problem of the BB s method s ts low recall rate snce the number of common clcks on the URLs s rather small. Another problem of the smlarty functon proposed by BB s that t cannot dentfy nose lnks n the clusterng process. Consder the example shown n Fg. 4, where the number attached to a lnk s the total number of clcks on the document. In Fg. 4(a), q 2 s a hot query whch generates 1000 clcks for each of the documents d 2 and d 3, whle q 1 s a cold query whch only generates 10 clcks for each of the documents d 1 and d 2. Even though the clck dstrbutons for q 1 and q 2 are dfferent, we can see that d 1 and d 2 are both relevant to q 1 because the number of clcks on d 1 and the number of clcks on d 2 are roughly the same for q 1 (.e. 10 clcks). (1)

5 AUTHOR ET AL.: TITLE 5 Smlarly, we can see that d 2 and d 3 are both relevant to q 2 because the number of clcks on d 2 and the number of clcks on d 3 are roughly the same for q 2 (.e clcks). Thus, we conclude that q 1 and q 2 are smlar queres because they share the common relevant document d 2. However, n Fg. 4(b), d 2 cannot be consdered relevant to q 1 because only a small fracton of the clcks (10 out of 1010) supports that concluson. Consequently, we cannot conclude that q 1 and q 2 are smlar queres. BB s smlarty functon does not detect the nose lnk as shown Fg. 4(b). It gves the same smlarty score of 1/3 n both cases. To solve the problem, the followng smlarty functon was proposed n our earler work [13]. sm( x, = L( x, L( x) L( 0 f L( x) L( > 0 otherwse where L(x, s the set of lnks connectng x and y to the same vertces, L(x) and L( are all the lnks connectng to x and y, respectvely, and L( ) s the cardnalty of L( ). Applyng the smlarty functon, we get a smlarty score of 1010/2020 = 1/2 for sm(q 1,q 2 ) n Fg. 4(a), and smlarty score of 1010/3010 = 1/3 for sm(q 1,q 2 ) n Fg. 4(b). Note that the score for sm(q 1,q 2 ) n Fg. 4(a) s hgher than that of Fg. 4(b), because most people are selectng document d 1 n Fg. 4(b), and the lnks between q 1 and d 2 can be consdered as nose. Therefore, t s reasonable to assgn a lower score to sm(q 1,q 2 ) n Fg. 4(b). Usng the nose-tolerant smlarty functon, the smlarty between two vertces always les between [0,1]. The smlarty for two vertces s 0, f they share no common neghbor, and the smlarty between two vertces s 1, f they have exactly the same neghbor vertces. It s noted that nose elmnaton by tself s a dffcult problem snce t requres complex nference rules to dstngush the nformatve from the erroneous clcks. Snce the nose-tolerant verson has been shown to be superor to the orgnal verson [13] and we are not aware of any better methods, n the rest of ths paper, BB s algorthm refers to ths mproved verson of smlarty functon. 4 CONCEPT EXTRACTION Before explanng our concept-based clusterng method, we frst descrbe our concept extracton method, whch s composed of the followng three basc steps: 1) extractng concepts usng the web-snppets returned from the search engne, 2) mnng concept relatons, and 3) creatng a user concept preference profle usng the extracted concepts, concept relatons and user s clckthroughs. 4.1 Concept Extracton Usng Web-Snppets Our concept extracton method s nspred by the wellknown problem of fndng frequent tem sets n data mnng [9], [19]. When a user submts a query to the search engne, a set of web-snppets are returned to the user for (2) TABLE 3 EXTRACTED CONCEPTS FOR THE QUERY APPLE Concept t support(t ) Concept t support(t ) mac 0.1 macntosh 0.05 pod 0.1 tour 0.05 phone 0.1 slashdot apple 0.04 hardware 0.09 pcture 0.04 software 0.09 apple 0.04 bg apple 0.08 apple varety 0.04 apple store 0.06 musc 0.04 mac os 0.06 farm market 0.04 apple orchard 0.06 apple grower 0.04 apple valley 0.06 gft shop 0.04 apple and macntosh 0.06 apple farm 0.04 apple blossom festval 0.06 dentfyng the relevant tems. We assume that f a keyword or a phrase appears frequently n the web-snppets of a partcular query, t represents an mportant concept related to the query because t co-exsts n close proxmty wth the query n the top documents. We use the followng support formula for measurng the nterestngness of a partcular keyword/phrase t wth respect to the returned web-snppets arsng from a query q: support sf n ( t ) ( t ) = t (4) where n s the total number of web-snppets returned, sf(t ) s the snppet frequency of the keyword/phrase t (.e., the number of web-snppets contanng t ) and t s the number of terms n the keyword/phrase t. For smplcty, we omt q n the above expresson f no ambguty arses. To extract concepts for a query q, we frst extract all the keywords and phrases from the web-snppets returned by the query. After obtanng a set of keywords/phrases (t ), we compute the support for all t (support(t )). If the support of a keyword/phrase t s bgger than the threshold s (support(t ) > s), we would treat t as a concept for the query q. Table 3 llustrates the extracted concepts for the query q = apple. 4.2 Mnng Concept Relatons To fnd relatons between concepts, we apply a wellknown sgnal-to-nose rato formula from data mnng [16] to establsh smlarty between terms t 1 and t 2. The smlarty value of Church and Hanks' formula always les between [0,1], and thus can be used drectly n Step 3. n df ( t1 t2 ) sm( t1, t2 ) = log n (5) df ( t ) df ( t ) 1 2 where n s the number of documents n the corpus, df(t 1 t 2 ) s the ont document frequency of t 1 and t 2 and df(t) s the document frequency of the term t. In our context, two concepts t, t could co-exst n a web-snppet n the followng stuatons: 1) t and t coexst n the ttle, 2) t and t co-exst n the summary or 3) t exsts n the ttle, whle t exsts n the summary (or vce

6 6 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID Fg. 5. (a) A concept relatonshp graph for the query apple derved wthout ncorporatng user clckthroughs. (b) A concept preference profle constructed usng the user clckthroughs and the concept relatonshp graph n (a). w t s the nterestngness of the concept t to the user. More clcks on a concept gradually ncrease the nterestngness w t of the concept. versa). Therefore, we modfy Church and Hanks' formula for the three dfferent cases n our context as follows. ( t t ) sm ( t, t ) + sm ( t, t ) sm ( t, t ) sm = + (6) R, R, ttle R, summary R, other where sm R (t,t ) s the smlarty between concepts t and t, whch s composed of sm R,ttle (t,t ), sm R,summary (t,t ) and sm R,other (t,t ) as follows. n sfttle ( t t ) smr ttle ( t, t ) log log n, sfttle ( t ) sfttle ( t ) = α (7) n sf summary ( t t ) sm R summary ( t, t ) log log n, sf summary ( t ) sf summary ( t ) = α (8) n sfother ( t t ) smr other ( t, t ) log log n, sfother ( t ) sfother ( t ) = α (9) where n s the total number of web-snppets returned, sf ttle (t t ) s the ont snppet frequency of concepts t and t n document ttles, sf ttle (t) s the snppet frequency of concept t n document ttles, sf summary (t t ) s the ont snppet frequency of t and t n document summares, sf summary (t) s the snppet frequency of concept t n document summares, sf other (t t ) s the ont snppet frequency of concept t n a document ttle and t n the document's summary (or vce versa) and sf other (t) s the snppet frequency of concept t n ether document summares or document ttles. Usng the extracted concepts and concept relatons, we can create a concept relatonshp graph wth the extracted concepts as nodes and mned concept relatons as lnks. Fg. 5(a) shows a concept preference graph for the query q = apple. A lnk s created between concept t and t, f ther smlarty, sm R (t,t ), s greater than zero. The strength of each lnk s determned by sm R (t,t ) whch s the smlarty between concepts t and t. 4.3 Creatng User Concept Preference Profle The concept relatonshp graph s frstly derved wthout takng user clckthroughs nto account. Intutvely, the graph shows the possble concept space arsng from user's queres. The concept space, n general, covers more than what the user actually wants. For example, when the user searches for the query apple, the concept space derved from the web-snppets contans concepts such as pod, phone and recpe. If the user s ndeed nterested n the concept recpe and clcks on pages contanng the concept recpe, the clckthroughs should gradually favor the concept recpe and ts neghborhood (by assgnng hgher weghts to the nodes), but the weghts of the unrelated concepts such as phone, pod and ther neghborhood should reman zero. Therefore, we propose the followng formulas to capture user's nterestngness w t on the extracted concepts t when a clcked websnppet s, denoted by clck(s ), s found: ( s ) t s, wt = wt + 1 ( s ) t s, w = w + sm ( t, t ) f sm ( t, t ) > 0 clck (10) clck t t where s s a web-snppet, w t s the nterestngness weght of the concept t and t s the neghborhood concept of t. When a user clcks on s, the weght of concepts t appearng n s s ncremented by 1 to reflect the user's nterestngness on the concepts embedded n the clcked page s. For other concepts that are related to the clcked concepts on the concept relatonshp graph, they are ncremented accordng to the smlarty score gven n Equaton (5), whch s normalzed to the range [0,1]. Therefore, f a concept s closely related to the clcked concept, t s ncremented to a hgher value (whch could be as close to 1 as the clcked concepts). Otherwse, t s only ncremented by a small fracton (close to 0). By mposng user's nterestngness on the concepts, a concept preference profle wth respect to the nput query s created. Fg. 5(b) shows an example of concept preference profle n whch the user s nterested n nformaton about apple macntosh. w t n Fg 5(b) represents the nterestngness of the concepts to the user. The values of w t for macntosh and mac are hghest because the users have nterest n them (and the values of w t are ncremented usng Equaton (10)). Indrectly, the values of w t for mac os, software, apple store, Pod, Phone, and hardware are ncreased because they are related to apple macntosh and thus ncremented usng Equaton (11). Fnally, the weghts of the concepts about apple as frut are not R R (11)

7 AUTHOR ET AL.: TITLE 7 changed. As a result, the concepts formed two clusters representng the user concept preference profle. 5 CONCEPT-BASED CLUSTERING Usng the concepts extracted from web-snppets, we propose two concept-based clusterng methods. We frst extend BB s algorthm to a concept-based algorthm n Secton 5.1. In Secton 5.2, the concept-based algorthm s further enhanced to acheve effectve personalzed clusterng. 5.1 Clusterng on Query-Concept Bpartte Graph We now descrbe our concept-based algorthm (.e. BB s algorthm usng query-concept bpartte graph) for clusterng smlar queres. Smlar to BB's algorthm, our technque s composed of two steps: 1) Bpartte graph constructon usng the extracted concepts, and 2) agglomeratve clusterng usng the bpartte graph constructed n Step 1. Usng the extracted concepts and clckthrough data, the frst step of our method s to construct a queryconcept bpartte graph, n whch one sde of the vertces correspond to unque queres, and the other corresponds to unque concepts. If a user clcks on a search result, concepts appearng n the web-snppet of the search result are lnked to the correspondng query on the bpartte graph. Algorthm 1 shows the frst step of our method. After the bpartte graph s constructed, agglomeratve clusterng algorthm s appled to obtan clusters of smlar queres and smlar concepts. The nose-tolerant smlarty functon (recall Equaton (2)) s used for fndng smlar vertces on the bpartte graph G. The agglomeratve clusterng algorthm would teratvely merge the most smlar par of whte vertces, and then merge the most smlar par of black vertces and so on. We present the detals n Algorthm 2. Algorthm 1 Bpartte Graph Constructon Input: Clckthrough data CT, Extracted Concepts E Output: A Query-Concept Bpartte Graph G 1: Obtan the set of unque queres Q = {q 1,q 2,q 3 } from CT 2: Obtan the set of unque concepts C = {c 1,c 2,c 3 } from E 3: Nodes(G) = Q C where Q and C are the two sdes n G 4: If the web-snppet s retreved usng q Q s clcked by a user, create an edge e = (q,c ) n G, where c s a concept appearng n s. Algorthm 2 - Agglomeratve Clusterng Input: A Query-Concept Bpartte Graph G Output: A Clustered Query-Concept Bpartte Graph G c 1: Obtan the smlarty scores for all possble pars of queres n G usng the nose-tolerant smlarty functon gven n Equaton (2). 2: Merge the par of queres (q,q ) that has the hghest smlarty score. 3: Obtan the smlarty scores for all possble pars of concepts n G usng the nose-tolerant smlarty functon gven n Equaton (2). 4: Merge the par of concepts (c,c ) that has the hghest smlarty score. 5. Unless termnaton s reached, repeat Steps 1-4. The termnatng condton for BB s algorthm s when all connected components n G c satsfy the followng condtons: max q,q Q sm( q, q ) = 0 and max sm( c, c ) = 0. c,c C However, ths termnatng condton possbly generates a sngle bg cluster of queres and a sngle bg cluster of concepts because havng the smlarty threshold set to zero means that two queres (concepts) would be assgned to the same cluster even f they have only a tny fracton of overlappng concepts (queres). To resolve ths problem, we apply hgher smlarty thresholds, whch have been observed from our experments to yeld hgh precson and recall: max q,q Q sm( q, q ) = 0.18 and max sm( c, c ) = c,c C 5.2 Personalzed Concept-Based Clusterng We now explan the essental dea of our personalzed concept-based clusterng algorthm wth whch ambguous queres can be clustered nto dfferent query clusters. Personalzed effect s acheved by manpulatng the user concept preference profles n the clusterng process. In contrast to BB s agglomeratve clusterng algorthm, whch represents the same queres submtted from dfferent users by one query node, we need to consder the same queres submtted by dfferent users separately to acheve personalzaton effect. In other words, f two gven queres, whether they are dentcal or not, mean dfferent thngs to two dfferent users, they should not be merged together because they refer to two dfferent sets of concepts for the two users. Therefore, we treat each ndvdual query submtted by each user as an ndvdual vertex n the bpartte graph by labelng each query wth a user dentfer. Moreover, concepts appearng n the web-snppet of the search result wth nterestngness weghts greater than zero n the concept preference profle are lnked to the correspondng query on the bpartte graph. An example s shown n Fg. 6(a). We can see that the query apple submtted by users User1 and User3 become two vertces apple (User1) and apple (User3). If User1 s nterested n the concept apple store, as recorded n the concept preference profle, a lnk between the concept apple store and the query apple (User1) would be created. On the other hand, f User3 s nterested n the concept frut, a lnk between the concept frut and apple (User3) would be created. After the personalzed bpartte graph s created, our ntal experements revealed that f we apply BB s algorthm drectly on the bpartte graph, the query clusters generated wll quckly merge queres from dfferent users together and thus losng the personalzaton effect. We

8 8 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID Fg. 6. Performng personalzed concept-based clusterng algorthm on a small set of clckthrough data. Startng from top left: (a) The orgnal bpartte graph. (b), (c) Intal Clusterng. (d), (e) Communty Mergng. found that dentcal queres, though ssued by dfferent users and havng dfferent meanngs, tend to have some generc concept nodes such as nformaton n common, e.g., apple (User1) and apple (User3) both connect to the nformaton concept node n Fg. 6(a). Thus, these query nodes wll lkely be merged n the frst few teratons and causng more queres from dfferent users to be merged together n subsequent teratons. Consderng Fg. 6(a) agan, f apple (User1) and apple (User3) are merged, the next teraton wll merge the concept nodes apple store, frut and nformaton. When the clusterng algorthm goes further, queres across users wll be further clustered together. At the end, the resultng query clusters have no personalzaton effect at all. To resolve the problem, we dvde clusterng nto two steps. In the ntal clusterng step, an algorthm smlar to BB s algorthm s employed to cluster all the queres, but t would not merge dentcal queres from dfferent users. After obtanng all the clusters from the ntal clusterng step, the communty mergng step s employed to merge query clusters contanng dentcal queres from dfferent users. We can see from Fg 6(d) that apple (User1) and apple (User3) belong, correctly, to dfferent clusters. We wll see further n Secton 6.3 that the ntal clusterng step s able to generate hgh precson rate because t preserves the preference of each user, whle the communty mergng step s able to mprove the recall rate because of the collaboratve flterng effect. Algorthm 3 shows the detals of the personalzed clusterng algorthm. Smlar to the BB's algorthm, a queryconcept bpartte graph s created as nput for the clusterng algorthm. The bpartte graph constructon algorthm s smlar to Algorthm 1, except each ndvdual query submtted by each user s treated as an ndvdual vertex n the bpartte graph. Intal clusterng (.e. Steps 1-5 of Algorthm 3) s smlar to BB's agglomeratve algorthm as already dscussed n Secton 5.1. However, queres from dfferent users are not allowed to be merged n ntal clusterng. Fg. 6(b) and 6(c) show examples of query and concept mergng, respectvely. Fg. 6(d) llustrates the result of ntal clusterng. In communty mergng (.e. Step 6-8 of Algorthm 3), query clusters contanng dentcal queres from dfferent users are compared for mergng. Fg. 6(d) and 6(e) show an example of query cluster mergng. The query clusters {apple computer (User2), apple (User1) } and {apple (User2) and apple mac (User1) } both contan the query apple, and are leadng to the same concept apple store. Therefore, they are merged n communty mergng as one bg cluster. Good tmng to start communty mergng s mportant for the success of the algorthm. If we stop ntal clusterng too early (.e. not all clusters are well formed), communty mergng merges all the dentcal queres from dfferent users frst, and thus generates a sngle bg cluster wthout much personalzaton effect. However, f we stop ntal clusterng too late (.e. clusters are beng overly merged n ths case), the low precson rate generated by ntal clusterng would not be mproved by communty mergng. To obtan the optmal results n our experments, we use the followng termnatng condtons for ntal clusterng (-clusterng) and communty mergng (c-mergng) n Algorthm 3. These parameters are emprcally nvestgated n our experment. We wll further ustfy our choce usng Table 10 n Secton 6.3. max max clusterng q,q Q c mergng q,q Q sm( q, q ) 0.29 and max sm( c, c ) = = clusterng c,c C, q ) = 0.39 and max c mergng c,c C sm( q sm( c, c ) = 0.39.

9 AUTHOR ET AL.: TITLE 9 TABLE 4 CATEGORIES OF THE TEST QUERIES Category Descrpton Category Descrpton 1 Cookng 6 Computer Programmng 2 Dnng 7 Computer Gamng 3 Internet Shoppng 8 Musc 4 Travelng 9 Computer Scence Research 5 Automoble Reparng 10 Computer Hardware TABLE 5 STATISTICS OF THE CLICKTHROUGH DATA COLLECTED IN THE 1 ST EXPERIMENT Statstcs Number of users 30 Number of queres assgned to each use 5 Number of test Queres 150 Number of unque Queres 150 Maxmum number of retreved URLs for a query 100 Maxmum number of extracted concepts for a query 217 Maxmum number of extracted words for a query 1,093 Number of URLs retreved 14,880 Number of unque URLs retreved 12,430 Number of concepts retreved 13,321 Number of unque concepts retreved 6,008 Number of words retreved 117,924 Number of unque words retreved 21,920 The query clusters outputted by the algorthm are shown n Fg. 6(e). We assume n ths example that the lnks between the generc concept nodes, "nformaton", and the two query clusters are weak and the termnatng smlarty s able to prevent the mergng the query clusters about "apple computer" and "apple uce". We can see n the resultng clusters that User1 and User2 both submt the query apple n order to seek nformaton about apple computer, whle User3 submts the query apple to look for nformaton about apple uce. In ths example, even the query apple submtted by User1, User2 and User3 appear to be the same, the algorthm can successfully dfferentate them to archve personalzaton effect accordng to ndvdual user conceptual preferences. Fnally, we can see that queres about apple computer (e.g. apple mac, apple computer ) are suggested to User1 and User2, whle queres about apple uce (e.g. apple uce ) are suggested to User3. Algorthm 3 Personalzed Agglomeratve Clusterng Input: A Query-Concept Bpartte Graph G Output: A Personalzed Clustered Query-Concept Bpartte Graph G p // Intal Clusterng 1: Obtan the smlarty scores n G for all possble pars of queres usng the nose-tolerant smlarty functon gven n Equaton (2). 2: Merge the par of most smlar queres (q,q ) that does not contan same queres from dfferent users. 3: Obtan the smlarty scores n G for all possble pars of concepts usng the nose-tolerant smlarty functon gven TABLE 6 USER S INFORMATION NEEDS FOR THE 2 ND EXPERIMENT User Group Informaton Needs 1 Purchase of dgtal cameras 2 Purchase of prnters 3 Informaton on camera flms 4 Informaton on dessert cookng recpes 5 Purchase of clothes 6 Download of Mac software 7 Purchase of Macntosh 8 Purchase of Pod TABLE 7 STATISTICS OF THE CLICKTHROUGH DATA COLLECTED FOR 2 ND PART OF THE EXPERIMENTATION Statstcs Number of users 10 Number of queres assgned to each use 5 Number of test Queres 50 Number of unque Queres 38 Maxmum number of retreved URLs for a query 100 Maxmum number of extracted concepts for a query 168 Maxmum number of extracted words for a query 938 Number of URLs retreved 4,962 Number of unque URLs retreved 3,239 Number of concepts retreved 4,130 Number of unque concepts retreved 1,971 Number of words retreved 38,831 Number of unque words retreved 8,891 n Equaton (2). 4: Merge the par of concepts (c,c ) havng hghest smlarty score. 5. Unless termnaton s reached, repeat Steps 1-4. // Communty Mergng 6. Obtan the smlarty scores n G for all possble pars of queres usng the nose-tolerant smlarty functon gven n Equaton (2). 7. Merge the par of most smlar queres (q,q ) that contans same queres from dfferent users. 8. Unless termnaton s reached, repeat Steps EXPERIMENTAL RESULTS In ths secton, we evaluate the performance of the proposed clusterng methods for obtanng related queres usng user clckthroughs. In Secton 6.1, we frst descrbe the expermental setup for collectng the requred clckthrough data. In Secton 6.2, we compare the performance of BB's algorthm usng query-url, query-word, and query-concept bpartte graphs (or smply called the QU, QW and QC methods). In Secton 6.3, we evaluate the effectveness of our proposed personalzed concept-based clusterng (or smply called the P-QC method). In Secton 6.4, we dscuss the algorthmc complextes based on the related parameters. 6.1 Expermental Setup To collect the clckthrough data to evaluate our proposed

10 10 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 3 Google s one of the most popular commercal search engnes. If a dfferent search engne s used, we expect the absolute performances of the methods under evaluaton to be dfferent but ther relatve performances reman the same. methods, we mplemented a Google mddleware to track user clcks. Google 3 was chosen as a common bass for comparng the performance of the methods under evaluaton. We nvted 40 students from our department to use the mddleware to search 200 gven test queres whch are accessble at [1]. To avod any bas, the test queres are randomly selected from ten dfferent categores and submtted to Google wthout any modfcaton by the mddleware. Table 4 shows the topcal categores n whch the queres we have chosen. When a query s submtted to the mddleware, the top 100 search results from Google are retreved, and the web-snppets of the search results are dsplayed to the users. Snce most users would examne only the top 10 results, our concept extracton method, dggng deep nto the frst 100 results, wll dscover concepts related to the query that would otherwse be mssed by the users. The extracted concept relatonshp graph s then stored n our database. If a user clcks on one of the websnppets of the returned results, the user's clckthrough together wth hs/her concept preference profle are updated as dscussed n Secton 4.3. The threshold s for concept mnng was set to 0.03 and the threshold for establshng concept relatons (as specfed n Eqn 11) s set to zero. We chose these small thresholds so that as many concepts as possble are mned. The qualty of the query suggestons s then reled more on the clusterng algorthms, whch are the man focus of ths paper. In the frst experment (wll be descrbed n Secton 6.2), 30 students were asked to search the 150 test queres, all of whch have unambguous meanngs (e.g. apple pe and cheese cake ). The 150 test queres are separated nto 10 predefned clusters (e.g. the queres apple pe, cheese cake and brownes belong to the cluster about dessert recpes). The users were asked to clck on the websnppets of the returned results that are relevant to the queres. The clckthrough data collected are used to measure the performance of the concept-based clusterng method as dscussed n Secton 5.1. Table 5 shows the statstcs of our collected clckthrough data for ths experment. In the second experment (wll be descrbed n Secton 6.3), 10 students were asked to search usng another 50 test queres. Some of the test queres are ntentonally desgned to have ambguous meanngs (e.g. the query Canon could mean a dgtal camera or a prnter). The 50 test queres are separated nto 8 predefned clusters. Some of the queres could possbly exst n more than one cluster (e.g. the query Canon could belong to the cluster about dgtal cameras or the cluster about prnters). Each user s assgned wth one of the nformaton seekng tasks shown n Table 6. The users are then asked to clck on the web-snppets of the returned results that are both relevant to the queres and ther nformaton needs. The clckthrough data collected are used to measure the performance of the personalzed concept-based clusterng method as dscussed n Secton 5.2. Table 7 shows the statstcs of our collected clckthrough data for ths experment. 6.2 Comparng QU, QW and QC methods We now dscuss the result of the frst experment, whch compares the performance of QU, QW and QC methods. QU method s the orgnal nput of BB s algorthm whch serves as a baselne for comparson. QW method uses query-word bpartte graph whch s smlar to the queryconcept bpartte graph n that they are both constructed usng Algorthm 1. The dfference s that the former contans all words (excludng stopwords) from the websnppets and the latter contans the extracted concepts. QW and QC methods are necessary, snce they allow us to study the benefts of concept extracton. The three methods are also employed to cluster the collected data. The results are compared to our predefned clusters for precson and recall. Gven a query q and ts correspondng query cluster {q 1,q 2,q 3 } generated by a clusterng algorthm, the precson and recall are computed usng the followng formulas: Q _ relevant Q _ retreved precson( q) = (12) Q _ retreved Q _ relevant Q _ retreved recall( q) = (13) Q _ relevant where Q_relevant s the set of queres that exst n the predefned cluster for q, Q_retreved s set of the related queres {q 1,q 2,q 3 } generated by the algorthm. The precson and recall values from all queres are averaged for plottng the precson-recall fgures. The performance of the three methods s compared usng precson-recall fgures and best F-measure values. Fg. 7 shows the precson-recall fgures for QU, QW, QC methods. We observe that QC method yelds better recall rate than QU method (.e. the orgnal BB s algorthm), whle preservng hgh precson rates. Ths can be attrbuted to the fact that the average number of overlappng URLs between queres s only 16.3 accordng to the statstcs n Table 5, whereas the average number of overlappng concepts between the queres s 48.8, whch s much hgher than the URL overlap rate. As a result, related queres that cannot be dscovered by URL overlap can be brought together by our QC method, and thus mprovng the recall rate. The effect of hgh concept overlap rate s also apparent n Fg. 7, whch shows that the recall of QU method can only go up to around 0.8, whle QW and QC methods can go beyond 0.9. Note that QU method can yeld hgh precson rate because of the valuable URL overlaps between queres. However, QC method benefts both precson and recall comparng to QU method, showng that the use of extracted concepts s much better for fndng smlar queres. We also observe that QW method performs the worst among the three methods because common non-stop words such as dscusson, nformaton and news brng unrelated queres together, and thus lowerng both

11 AUTHOR ET AL.: TITLE 11 TABLE 8 BEST F-MEASURE VALUES OF QU, QW AND QC METHODS FOR THE 1 ST EXPERIMENT Best F-Measure Values Precson Recall F-measure QU method QW method QC method Fg. 8. Change of precson when performng QU, QW and QC methods. Fg. 7. Precson vs. recall when performng QU, QW and QC methods. the precson and recall rate. The man dfference between QW and QC methods s the avalablty of concept extracton. Intutvely, QC method outperforms QW method because the concept extracton process can successfully elmnate unrelated common words wthn web-snppets. Fg. 8 and 9 show the change of precson and recall respectvely for the three clusterng methods. In Fg. 8, when the cutoff smlarty score s around 0.3, the precson obtaned usng QU method s very close to that of QC method, whch s much better than the precson obtaned usng QW method. In Fg. 9, at the same cutoff smlarty score, the recall obtaned usng QU method s close to zero, whch s much lower comparng to the recalls obtaned usng QW and QC methods. We can easly see from Fg. 8 and 9 that QC method s able to generate good recall, whle achevng a precson comparable to that of QU method. We observe that the three methods are able to acheve ther optmal precson/recall at dfferent cutoff smlarty scores. To obtan and compare the best F-measures [30] (.e. evenly weghted harmonc means of precsons and recalls) for the three dfferent methods, the followng three termnatng strateges are used: max URL q,q Q max word q,q Q max concept q,q Q sm( q, q ) = and max sm( c, c ) = URL c,c C sm( q, q ) = 0.39 and max sm( c, c ) = word c,c C sm( q, q ) = 0.18 and max sm( c, c ) = concept c,c C The F-measure, F, s defned by the followng formula: Fg. 9. Change of recall when performng QU, QW and QC methods. ( precson recall) F = 2 (14) ( precson + recall ) Table 8 shows the best F-measure values for the QU, QW, and QC method. From the results, we can conclude that query clusters obtaned usng QC method are much more accurate comparng to those obtaned from QU and QW methods. 6.3 Personalzed Concept-Based Clusterng In the second experment, QU, QW, QC and P-QC methods are employed to cluster queres whch are ntentonally desgned to have ambguous meanngs. Agan, the results are compared to our predefned clusters n terms of precson and recall. We analyze the performance of P- QC method usng precson-recall fgures and best F- measure values. Fg. 10 shows the precson-recall fgures of P-QC methods. The sold lne s the precson-recall graph f only ntal clusterng s performed. We can observe that recall s max out at The other three lnes llustrate how communty mergng can further mprove recall be-

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

11. HARMS How To: CSV Import

11. HARMS How To: CSV Import and Rsk System 11. How To: CSV Import Preparng the spreadsheet for CSV Import Refer to the spreadsheet template to ad algnng spreadsheet columns wth Data Felds. The spreadsheet s shown n the Appendx, an

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

An Iterative Implicit Feedback Approach to Personalized Search

An Iterative Implicit Feedback Approach to Personalized Search An Iteratve Implct Feedback Approach to Personalzed Search Yuanhua Lv 1, Le Sun 2, Junln Zhang 2, Jan-Yun Ne 3, Wan Chen 4, and We Zhang 2 1, 2 Insttute of Software, Chnese Academy of Scences, Beng, 100080,

More information

IN recent years, we have been witnessing the explosive

IN recent years, we have been witnessing the explosive IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 1 Query Expanson by Mnng User Logs Hang Cu, J-Rong Wen, Jan-Yun Ne, and We-Yng Ma, Member, IEEE Abstract Queres to

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

AS the Web keeps expanding, the number of pages

AS the Web keeps expanding, the number of pages IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 11, NOVEMBER 2008 1505 Personalized Concept-Based Clustering of Search Engine Queries Kenneth Wai-Ting Leung, Wilfred Ng, and Dik Lun Lee

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors

Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors Structured Query Suggeston for Specalzaton and Parallel Movement: Effect on Search Behavors Makoto P. Kato Tetsuya Saka Katsum Tanaka Mcrosoft Research Asa, Chna tetsuyasaka@acm.org Kyoto Unversty, Japan

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton

More information

Recommendations of Personal Web Pages Based on User Navigational Patterns

Recommendations of Personal Web Pages Based on User Navigational Patterns nternatonal Journal of Machne Learnng and Computng, Vol. 4, No. 4, August 2014 Recommendatons of Personal Web Pages Based on User Navgatonal Patterns Yn-Fu Huang and Ja-ang Jhang Abstract n ths paper,

More information

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines A Novel Optmzaton Technque for Translaton Retreval n Networks Search Engnes Yanyan Zhang Zhengzhou Unversty of Industral Technology, Henan, Chna Abstract - Ths paper studes models of Translaton Retreval.e.

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Background Removal in Image indexing and Retrieval

Background Removal in Image indexing and Retrieval Background Removal n Image ndexng and Retreval Y Lu and Hong Guo Department of Electrcal and Computer Engneerng The Unversty of Mchgan-Dearborn Dearborn Mchgan 4818-1491, U.S.A. Voce: 313-593-508, Fax:

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Extraction of User Preferences from a Few Positive Documents

Extraction of User Preferences from a Few Positive Documents Extracton of User Preferences from a Few Postve Documents Byeong Man Km, Qng L Dept. of Computer Scences Kumoh Natonal Insttute of Technology Kum, kyungpook, 730-70,South Korea (Bmkm, lqng)@se.kumoh.ac.kr

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information