IN recent years, we have been witnessing the explosive

Size: px
Start display at page:

Download "IN recent years, we have been witnessing the explosive"

Transcription

1 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST Query Expanson by Mnng User Logs Hang Cu, J-Rong Wen, Jan-Yun Ne, and We-Yng Ma, Member, IEEE Abstract Queres to search engnes on the Web are usually short. They do not provde suffcent nformaton for an effectve selecton of relevant documents. Prevous research has proposed the utlzaton of query expanson to deal wth ths problem. However, expanson terms are usually determned on term co-occurrences wthn documents. In ths study, we propose a new method for query expanson based on user nteractons recorded n user logs. The central dea s to extract correlatons between query terms and document terms by analyzng user logs. These correlatons are then used to select hgh-qualty expanson terms for new queres. Compared to prevous query expanson methods, ours takes advantage of the user udgments mpled n user logs. The expermental results show that the log-based query expanson method can produce much better results than both the classcal search method and the other query expanson methods. Index Terms Query expanson, user log, probablstc model, nformaton retreval, search engne. æ 1 INTRODUCTION IN recent years, we have been wtnessng the explosve growth of nformaton on the World Wde Web. People are relyng more and more on the Web for ther dverse needs of nformaton. However, the Web s an nformaton hotpot where nnumerous authors have created and are creatng ther Web stes ndependently. The vocabulares of the authors vary greatly. There s an acute requrement for search engne technology to help users explot such an extremely valuable resource. Despte the fact that keywords are not always good descrptors of contents, most exstng search engnes stll rely solely on keyword-matchng to determne the answers. Users usually descrbe ther nformaton needs by a few keywords n ther queres, whch are lkely to be dfferent from those ndex terms of the documents on the Web. Ths problem s general n Informaton Retreval (IR) systems and has been documented before the popularzaton of the Web: New or ntermttent users often use the wrong words and fal to get the actons or nformaton they want [15]. As a consequence, n many cases, the documents returned by search engnes are not relevant to the user nformaton need. Ths rases a fundamental problem of term msmatch n nformaton retreval, whch s also one of the key factors that affect the precson of the search engnes. Very short queres submtted to search engnes on the Web amplfy ths problem: Many mportant words or terms may be mssng from the queres. To solve ths problem, researchers have nvestgated the utlzaton of query expanson technques to help users formulate better queres.. H. Cu s wth the Department of Computer Scence, School of Computng, Natonal Unversty of Sngapore, Sngapore, E-mal: cuhang@comp.nus.edu.sg.. J.-R. Wen and W.-Y. Ma are wth Mcrosoft Research Asa, 3F Sgma Buldng, No. 49, Zhchun Rd. Hadan Dstrct, Beng , Chna. E-mal: {rwen, wyma}@mcrosoft.com.. J.-Y. Ne s wth the Département d nformatque et Recherche Opératonnelle, Unversté de Montréal C.P. 6128, succursale Centre-vlle Montreal, Quebec H3C 3J7 Canada. E-mal: ne@ro.umontreal.ca. Manuscrpt receved 15 July 2002; revsed 15 Dec. 2002; accepted 6 Jan For nformaton on obtanng reprnts of ths artcle, please send e-mal to: tkde@computer.org, and reference IEEECS Log Number Query expanson nvolves supplementng the orgnal query wth addtonal words and phrases. There are two key aspects n any query expanson technque: the source from whch expanson terms are selected and the method to weght and ntegrate expanson terms. Manual query expanson has been studed by many researchers, for example, [1] and [17]. Manual query expanson demands user nterventons. It s also requred that the user s famlar wth the onlne search system, the ndexng mechansm, and the doman knowledge, whch s generally not true for the users on the Web. In ths paper, we wll focus on automatc query expanson. Current automatc query expanson technques can be generally categorzed nto global analyss and local analyss. A query expanson method based on global analyss usually bulds a thesaurus to assst users reformulatng ther queres. A thesaurus can be automatcally establshed by analyzng relatonshps among documents and statstcs of term co-occurrences n the documents. From the thesaurus constructed n ths way, one wll be able to obtan synonyms or related terms gven a user query. Thus, these related terms can be used for supplementng users orgnal queres. Another group of technques for query expanson s local analyss, whch extracts expanson terms from a subset of the ntal retreval results. Ths subset may be determned drectly by the user accordng to relevance udgments, or by the system (.e., the top-ranked documents). Terms selected from them are added n a new query or ther weghts n the latter are ncreased [31]. Compared to the thesaurus-based expanson technque, local analyss s more query-orented. Prevous experments have shown sgnfcant mpact of local analyss on retreval effectveness. However, f the subset of documents s selected by the user, then we put a heavy burden on the user. If t s selected by the system, then t s questonable whether they are ndeed relevant to the query; thus, the mprovement on retreval effectveness s uncertan. In ths paper, we propose a new query expanson method based on user logs whch record user nteractons /03/$17.00 ß 2003 IEEE Publshed by the IEEE Computer Socety

2 2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 wth the search systems. User logs are exploted so as to extract mplct relevance udgments they encode. In ths approach, we assume that the documents that the user chose to read are relevant documents. The log-based query expanson overcomes several dffcultes of local analyss because we can extract a large number of user udgments from user logs, whle elmnatng the step of collectng feedbacks from users for ad hoc queres. Probablstc correlatons between terms n the user queres and the documents can then be establshed through user logs. Wth these term-term correlatons, relevant expanson terms can be selected from the documents for a query. Our experments show that mnng user logs s extremely useful for mprovng retreval effectveness, especally for very short queres on the Web. In ths paper, we carry out a seres of experments to nvestgate the effects of our query expanson method on queres of dfferent length. The expermental results on both long and short queres are presented n ths artcle. As we wll see, query expanson produces more sgnfcant mprovements on short queres than on long queres. The remander of ths paper s organzed as follows: Secton 2 descrbes the problem of nconsstency between query terms and document terms, whch wll motvate our approach. Our expermental result suggests a large dfference between the terms used n queres and those n documents, therefore, the need n developng approprate query expanson technques for Web search. Secton 3 revews prevous work on query expanson. Our log-based query expanson technque s descrbed n detal n Secton 4. Sectons 5 and 6 descrbe the experments comparng our method wth local context analyss. Secton 7 draws some conclusons. 2 MOTIVATION The problems of under-specfcaton and napproprate term usage n user queres are two motvatons for studyng query expanson. They are due to two facts: queres are often short, thus contan nsuffcent number of terms; and query terms are often nconsstent wth (dfferent from) those n the documents. In ths secton, we wll examne these two facts wth respect to a search engne on the Web. It s generally observed that users on the Web typcally submt very short queres to search engnes and the average length of Web queres s less than two words [34]. A smlar concluson was drawn n [9]. We deduce that the very small overlap of the query terms and the document terms n the desred documents negatvely affects the performance of Web searchng. In [15], t was observed that people use a surprsngly great varety of words when referrng to the same thng and, thus, terms n user queres often fal to match the ndex terms contaned n the relevant documents. It s even worse when the query s very short as on the Web. In ths case, the chance of msmatchng s much larger than for a long query. In fact, we can vew the term usages n the documents as formng a term space, that we call document space. The term usages n the queres form another term space query space. The msmatchng problem we ust descrbed comes from the nconsstency between the two spaces. Ths fact has often been hypotheszed. However, no prevous study has tred to measure the dfference between the two spaces quanttatvely. Ths measurement s dffcult because the number of relevant udgments s always lmted. Wth a large amount of user logs that we consder to encode relevance udgments, ths becomes possble. In order to confrm the large dfference between the two term spaces, we wll measure the smlarty between them. It s to be noted, however, that the resultng measure of smlarty s an approxmaton. A true measure of smlarty s only possble wth real relevance udgments. Our measurement s conducted wth two-month user logs (about 22 GB) from the Encarta search engne ( encarta.msn.com), as well as the 41,942 documents n the Encarta Web ste. The user logs contans 4,839,704 user query sessons. Each query sesson conssts of the query tself and ts correspondng document clckthroughs (the documents on whch the user clcked, see Secton 4). Below s an excerpt of query sessons. n We represent each o document as a document vector W ðdþ 1 ;W ðdþ 2...W ðdþ N n the document space, where W ðdþ s the weght of the th term n a document and t s defned by the tradtonal TF*IDF measure: W ðdþ lnð1 þ tf ðdþ Þdf ðdþ ¼ q ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff P ln 2 ð1 þ tf ðdþ Þ P ; ð1þ ðdf ðdþ Þ 2 df ðdþ ¼ ln N n ; ð2þ where tf ðdþ s the frequency of the th term n the document D, N the total number of documents n the collecton, and n the number of documents contanng the th term. For each document, we can construct a correspondng vrtual document n the query space by collectng and countng all the terms, excludng stopwords, n the queres for whch the document has been selected and clcked on by the user. An vrtual document o s represented as a query vector W ðqþ 1 ;WðqÞ 2...W ðqþ N, where W ðqþ s the weght of the th term n the vrtual document and t s also defned by the TF*IDF measure. The smlarty between the two vectors s calculated and t s assumed to reflect the smlarty between the query space and document space we measure. Specally, the smlarty of each par of vectors s calculated usng the followng Cosne formula:

3 CUI ET AL.: QUERY EXPANSION BY MINING USER LOGS 3 Fg. 1. Smlarty between the query terms and the document terms. P N ¼1 Smlarty ¼ W ðqþ W ðdþ qffffffffffffffffffffffffffffffffffffffffffffffffffffq ffffffffffffffffffffffffffffffffffffffffffffffffffff : ð3þ P N ðqþ ¼1ðW Þ 2 P N ðdþ ¼1ðW Þ 2 We notce that many terms n the document space never or seldom appear n the users queres. Thus, the query vector created s much shorter (wth less nonzero terms) than a document vector. Ths artfact wll dramatcally decrease the smlarty between the two vectors f all the terms are used n the measurement. To obtan a farer measure, we only use the n hghest rankng words n each document vector for the smlarty calculaton, where n s the number of terms n the correspondng query vector. Fg. 1 llustrates the fnal results of smlarty values on the whole document collecton. Ths fgure shows that, n most cases, the smlarty values of term usages between user queres and documents are between 0.1 and 0.4. Only very few documents have smlarty values above 0.8. The average smlarty value across the whole document collecton s 0.28, whch means the average nternal angle between the query vector and the document vector s degree. Ths result suggests that there s ndeed a large gap between the query space and the document space. It s thus very dffcult to retreve the desred documents wth a drect keyword matchng approach. It s mportant to fnd ways to narrow the gap or to brdge the two spaces n order to mprove retreval effectveness. 3 REVIEW OF PREVIOUS WORK ON AUTOMATIC QUERY EXPANSION In ths secton, let us revew some prevous approaches to query expanson. The exstng state-of-the-art query expanson approaches can be classfed manly nto two categores technques based on global analyss, whch obtans expanson terms on the statstcs of terms n the whole corpus, and local analyss, whch extracts expanson terms from a subset of the search results. 3.1 Global Analyss In ths secton, we only revew the approaches that explot term co-occurrences n documents. We do not analyze the approaches that use a manual thesaurus (e.g., WordNet [22]). One can refer to [33] for some examples of utlzaton of such a resource for query expanson. Global analyss s one of the frst technques to produce consstent and effectve mprovements through query expanson. The basc dea of global analyss s to use the context of a term to determne ts smlarty wth other terms. Global analyss selects expanson terms on the bass of the nformaton on the whole document set. It bulds a set of statstcal term relatonshps whch are then used to expand queres. One of the earlest global analyss technques s term clusterng [20], [32]. Queres are smply expanded by addng smlar terms that are grouped nto the same cluster accordng to term co-occurrences n documents. Qu and Fre [24] presented a query expanson model usng a global smlarty thesaurus. Another work based on a global statstcal thesaurus s [10], whch frst clusters documents and then selects low-frequency terms to represent each cluster. PhraseFnder [19] s a component of the INQUERY system that creates an assocaton thesaurus. The phrases selected by PhraseFnder are used n query expanson. Latent Semantc Indexng [12] can also be vewed as a knd of query expanson. In ts reduced dmensonal space, mplct correlatons among terms can be dscovered and employed n expandng orgnal queres. Generally, global analyss requres corpus-wde statstcs, such as statstcs of co-occurrences of pars of terms, resultng n a matrx of smlartes between terms or a global assocaton thesaurus. Although the global analyss technques are relatvely robust, the corpus-wde statstcal analyss consumes a consderable amount of computng resources. Moreover, snce t focuses only on the document sde and does not take nto account the query sde, global analyss only provdes a partal soluton to the term msmatchng problem. 3.2 Local Analyss Dfferent from global analyss, local analyss uses only a subset of documents that s returned wth the gven query. The result s thus more focused on the gven query than global analyss. Local analyss technques are grouped nto two categores: approaches based on user feedback nformaton and approaches based on nformaton derved from a subset of the returned documents Relevance Feedback Relevance feedback s a straghtforward strategy for reformulatng queres. In a relevance feedback cycle, the user s presented wth a lst of ntal results. After examnng them, the user marks those documents he or she consders relevant. The orgnal query s expanded accordng to these relevant documents. The expected result s that the next round of retreval wll move toward the relevant documents and away from nonrelevant documents. Early experments wth the Smart system [30] and later expermental results usng a probablstc model [25] ndcate mprovements n effectveness wth relevance feedback for small collectons. Roccho performed query reformulaton usng vector space model and obtaned sgnfcantly postve results [27]. Salton and Buckley [31] dd experments on sx test collectons to compare varous relevance feedback methods.

4 4 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 Ther work manly conssted of term reweghtng and query expanson. Typcally, expanson terms are extracted from the relevant documents udged by the user. Relevance feedback can acheve very good performance f the user provdes suffcent and correct relevance udgments. Unfortunately, n a real search context, users usually are reluctant to provde such relevance feedback Local Feedback To overcome the dffculty due to the lack of suffcent relevance udgments, local feedback, also known as blnd feedback or pseudofeedback, s commonly used n IR. Local feedback mmcs relevance feedback by assumng that the top-ranked documents are relevant [4]. Expanson terms are extracted from the top-ranked documents to formulate a new query for a second-cycle retreval. Local feedback has been proven effectve n prevous TREC experments. In some cases, t outperforms global analyss [6], [13], [14], [26]. Nevertheless, ths method can hardly overcome ts nherent drawback: If a large fracton of the top-ranked documents are actually rrelevant, then the words added nto the query (drawn from these documents) are lkely to be unrelated to the topc and as a result, the qualty of the retreval usng the expanded query s degraded. Therefore, the effect of pseudofeedback strongly depends on the qualty of the ntal retreval. In recent years, many mprovements for local feedback have been proposed. Mtra et al. [23] suggested mprovng query expanson by refnng the set of documents used n feedback wth Boolean flters and proxmty constrants. Clusterng the top-ranked documents and removng the sngleton clusters are technques used n [21] n order to concentrate on large groups of relevant documents for query expanson. Buckley et al. [5] employed clusterng to dentfy concepts. More recently, Carpneto et al. [7] presented a method of weghtng and selectng expanson terms usng Informaton Theory. To enhance the relablty of pseudorelevance feedback (PRF), Flexble PRF was proposed n [29], whch vares the number of expanson terms accordng to the number of documents retreved. Recently, Xu and Croft [37], [38] proposed a local context analyss method, whch apples the measure of global analyss to the selecton of query terms n local feedback. From the top-ranked documents, noun groups are selected accordng to ther co-occurrences wth the query terms. In ths way, the local context analyss method can solve the problem of nsuffcent statstcal data of local analyss to some extent. However, local context analyss s based on the hypothess that a frequent term from the top-ranked relevant documents wll tend to co-occur wth all query terms wthn the top-ranked documents. Ths s a reasonable hypothess, but not always true, as shown by our examnaton on the gap between the document and query spaces. Ths s precsely the problem we wll address by explotng user logs for query expanson. 4 LOG-BASED QUERY EXPANSION To deal wth the msmatchng problem at ts source,.e., the nconsstency problem between the terms used n the documents and those used n the queres, a possble way s to create relatonshps between the two sets of terms. User logs provde a resource explotable for ths end. 4.1 Prncple of Usng User logs We observe that many search engnes have accumulated a large amount of user logs from whch we can know what the query s and what the documents users have selected to read. These user logs provde valuable ndcatons to understand the knds of documents the users ntend to retreve by formulatng a query wth a set of partcular terms. There has been some work on mnng user logs to enhance Web searchng. Beeferman and Berger [2] exploted clckthrough data n clusterng URLs and queres usng graph-based teratve clusterng technque. Wen et al. [34] used a smlar method to cluster queres accordng to user logs n order to fnd Frequently Asked Questons (FAQs). These FAQs are then used to mprove the effectveness of queston answerng. In ths study, we further extend the prevous utlzatons of user logs by tryng to extract relatonshps between query terms and document terms. These relatonshps are then used for query expanson. Thus, our work may be vewed as a tral to construct a lve thesaurus that brdges the document and the query spaces. The general prncple s: If queres contanng one term often lead to the selecton of documents contanng another term, then we consder that there s a strong relatonshp between the two terms. Ths prncple s an extenson to that explotng term cooccurrences. In prevous approaches, term co-occurrences are observed wthn documents. The term relatonshps extracted from them are those between the terms used by the same authors. Therefore, we can see them as relatonshps wthn the document space. As we explaned earler, an mportant factor of the msmatchng problem s the lack of relatonshps between the document space and the query space. There s an acute need to create a brdge between them. The dea of explotng user logs precsely ams to create such a brdge between the two spaces. To explot ths prncple, our frst task s to extract query sessons from a large set of nosy log data. The query sessons we extract are defned as follows: sesson :¼< query text > ½clcked documentš Each sesson contans one query and a set of documents whch the user clcked on (whch we wll call clcked documents). The central dea of our method s that, f a set of documents s often selected for the same queres, then the terms n these documents are strongly related to the terms of the queres. Thus, some probablstc correlatons between query terms and document terms can be establshed based on the user logs. One mportant assumpton behnd ths method s that the clcked documents are relevant to the query. At the frst glance, ths assumpton may appear too strong. However, although the clckng nformaton s not as accurate as explct relevance udgment n tradtonal relevance feedback, the user s choce does suggest a certan degree of relevance. Typcally, upon gettng a lst of documents, many users do not select resultng documents

5 CUI ET AL.: QUERY EXPANSION BY MINING USER LOGS 5 Fg. 2. Establshng correlatons between query terms and document terms va query sessons. randomly. They have a rough dea of what the documents are about from ther ttles and snppets. In most of the cases, they clck and read those documents whch are the most smlar to what they have n mnd. Therefore, these clcked documents do have some relatonshp wth the queres they submt. Of course there are exceptons, such as an error clck or a sudden shft n the user s ntenton. But, n the long run wth a large amount of log data, the clck-through records allow us to fnd strong correlatons among terms from a statstcal pont of vew. Smlar observaton has been made n [34]. On the whole, user logs can be vewed as a very relable resource contanng abundant mplct relevance udgments. 4.2 Characterstcs of Log-Based Query Expanson In a more general sense, the log-based query expanson method may be vewed as a specal case of local analyss because ts expanson terms are derved from a subset of the documents. However, t s enhanced by human udgments: Not only the clcked documents are usually top-ranked documents, but also they have been selected by the users. Ths method thus has several advantages over relevance feedback and pseudorelevance feedback. Recall that the factor whch lmts the applcaton of relevance feedback s the unavalablty of user relevance udgments n a multple-query process. Users tend to mark only a few, f any, documents when presented wth a lst of resultng documents. In addton, ths feedback nformaton can be exploted only once. Once the query s changed, the same feedback process s to be started agan. Log-based query expanson collects and analyzes all users hstorcal relevance udgments as a whole wthout nterventon of users. We beneft from abundant records of voted documents, whle the bases or errors n a sngle round of feedback can be mnmzed. Thus, we can overcome the problem of lackng suffcent relevance udgments n prevous local feedback technques. On the other hand, compared to the pseudorelevance feedback, our method has an obvous advantage: Not only are the clcked documents part of the top-ranked documents, but also there s a further selecton by the user. Because document clcks are more relable ndcatons than top-ranked documents used n pseudorelevance feedback, log-based query expanson s expected to be more robust and accurate than the former. The log-based query expanson method has three other mportant propertes. Frst, snce the term correlatons can be precomputed offlne, the ntal retreval phase of pseudorelevance feedback s not needed anymore. Second, snce user logs contan query sessons from dfferent users, the term correlatons can reflect the preference of the maorty of the users. For example, f the maorty of the users use wndows to search for nformaton about Mcrosoft Wndows product, the term wndows wll have stronger correlatons wth the terms such as Mcrosoft, OS, and software, than wth the terms such as decorate, door, and house. Thus, the expanded query wll result n a hgher rankng for the documents about Mcrosoft Wndows, whch corresponds to the ntentons of most users. The smlar dea has been used n several exstng search engnes, such as Drect Ht ( Our query expanson approach can produce the same results. Thrd, the term correlatons may evolve along wth the accumulaton of user logs. Hence, the query expanson process can reflect updated user nterests at a specfc tme. 4.3 Correlatons between Query Terms and Document Terms Query sessons n the user logs provde a possble way to brdge the gap between the query space and the document space. As llustrated n Fg. 2, weghted lnks can be created between the query space (all the query terms) and the query sessons, as well as between the document space (all the document terms) and the sessons. In general, we assume that the terms n a query are correlated to the terms n the documents that the user clcked on. If there s at least one path between one query term and one document term, a lnk s created between them. Thus, the correlatons between the query terms and document terms can be measured by nvestgatng the weghts of the lnks consttutng the path between them. By analyzng a large number of such lnks, we can obtan a new matrx storng probablstc correlatons between the terms n these two

6 6 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 spaces (the rght part of Fg. 2). Ths s smlar, n prncple, to buldng a term-term smlarty thesaurus n global analyss as n [36]. However, t benefts from the addtonal user udgments. Let us now dscuss how to determne the degrees of correlaton between terms. We defne these degrees as the Þ for and any query term w ðqþ. The Þ can be determned as follows (where condtonal probabltes between terms,.e., P ðw ðdþ any document term w ðdþ probablty Pðw ðdþ w ðqþ w ðqþ S s a set of clcked documents for queres contanng the query term w ðqþ ): Pðw ðdþ ¼ ¼ w ðqþ Þ¼ pðwðdþ P8D k2s PðwðdÞ P ðw ðqþ Þ P8D k2s PðwðdÞ ;w ðqþ Þ Pðw ðqþ Þ ;w ðqþ w ðqþ ;D k Þ ;D k ÞP ðw ðqþ P ðw ðqþ Þ ;D k Þ : We assume that P ðw ðdþ w ðqþ ;D k Þ¼Pðw ðdþ D k Þ. Ths means that the document D k separates the query term from the document term w ðdþ. Therefore, P ðw ðdþ ¼ X 8D k2s w ðqþ Þ¼ P8D k2s P ðwðdþ P ðw ðdþ D k ÞPðD k w ðqþ Þ: D k ÞP ðd k w ðqþ P ðw ðqþ Þ ÞPðw ðqþ Þ P ðd k w ðqþ Þ s the condtonal probablty of the document D k beng clcked when w ðqþ appears n the user query. D k Þ s the condtonal probablty of occurrence of w ðdþ f the document s selected. PðD k w ðqþ Þ and P ðw ðdþ D k Þ can be estmated, respectvely, from the user logs and from P ðw ðdþ the frequency of occurrences of terms n documents as follows: where P ðd k w ðqþ P ðw ðdþ D k Þ¼ Þ¼ fðqþ k ðwðqþ f ðqþ ðw ðqþ W ðdþ k P8t2D k W ðdþ tk ð4þ ;D k Þ ; ð5þ Þ ; ð6þ. f ðqþ k ðwðqþ ;D k Þ s the number of the query sessons n whch the query term w ðqþ and the document D k appear together.. f ðqþ ðw ðqþ Þ s the number of the query sessons that contan the term w ðqþ.. Pðw ðdþ D k Þ s the normalzed weght of the term w ðdþ n the document D k, whch s dvded by the sum of all term weghts n the document D k. By combnng (4), (5), and (6), we obtan the followng formula for P ðw ðdþ w ðqþ Þ: Pðw ðdþ w ðqþ Þ¼ X 8D k 2S P ðw ðdþ D k Þ fðqþ k ðwðqþ f ðqþ ðw ðqþ! ;D k Þ Þ : ð7þ 4.4 Query Expanson Based on Term Correlatons Equaton 7 descrbes how to calculate the chance of a document term beng selected as an expanson term gven a query term. We also need to determne the relatonshp of a document term to the whole query n order to rank t. For ths, we use an dea smlar to that of [24],.e., we select expanson terms accordng to ther relatonshp to the whole query. The relatonshp of a term to the whole query s measured by the followng coheson calculaton: CoWeght Q ðw ðdþ Þ¼ln ðqþ w t 2Q P wðdþ w ðqþ t þ 1 whch combnes the relatonshps of the term to all the query terms. On the whole, log-based query expanson takes the followng steps to expand a new query Q: 1. Extract all query terms (elmnatng stopwords) from Q. 2. Fnd all documents related to any query term n query sessons. 3. To each document term n these documents, use (8) to calculate ts evdence of beng selected as an expanson term accordng to the whole query. 4. Select n document terms wth the hghest coheson weght and formulate the new query Q 0 by addng these terms nto Q. 5. Use Q 0 to retreve documents n a searchng system. 5 EXPERIMENTAL DATA AND METHODOLOGY Before llustratng the expermental results, let us frst descrbe the test data used. 5.1 Data Due to the characterstcs of our query expanson method, we cannot conduct experments on standard test collectons such as the TREC 1 data snce they do not contan user logs that we need. To deduct term-term correlatons, we use the same twomonth user logs from the Encarta Web ste as descrbed n Secton 2, whch contans 4,839,704 user query sessons. Wth respect to documents set, we collected 41,942 documents from the Encarta Web ste to form the test corpus. Dverse topcs are covered by these artcles wth greatly varyng lengths, from dozens of words to several thousand words. In user logs, each document bears a large number of queres wth whch users have clcked on that document. Ths ensures that we have suffcent clck-through nformaton to establsh meanngful probablstc correlatons among terms n the two spaces. In addton, ths data set can reflect the mpact of our query expanson technque for searches on the Web snce t s obtaned from a real search engne. We focus on usng query expanson to counter the effect of short queres on the Web. Xu and Croft [38] conducted experments on very short queres, n whch the results showed that query expanson can produce even larger 1. ð8þ

7 CUI ET AL.: QUERY EXPANSION BY MINING USER LOGS 7 TABLE 1 Lst of Queres n Both the Long Query Set and the Short Query Set mprovements on short queres than on long queres. We compled two sets of queres n order to see how query expanson affects retreval results on short queres and long queres. In order to test our method on a more general bass, some queres are extracted randomly from the user logs. Some others come from the TREC query set. Yet, another subset of queres s added manually by us. Table 1 shows all the 30 queres n both short and log versons used n our experments. The short queres n our experments are very close to those employed by the real Web users and the average length of these queres s 2.0 words. The average length of the long queres s 4.8 keywords (excludng the stopwords). Though t s stll shorter than the average length of most TREC queres, we consder that t reflects the real stuaton on the Web snce few users use over fve keywords to express ther nformaton needs. We used three human assessors to buld the relevance udgments. Relevant documents for each query were udged accordng to the human assessors manual selectons, and standard relevant document sets were prepared for all of the 30 queres. Assessors had no knowledge of the testng methods, but made decsons wth the assstance of a basc searchng system. To solve ther dsagreements when they occurred, the assessors dscussed them together. All udgments from the assessors consttuted a reference set. We run all experments n a batch mode accordng to the relevance udgment set. 5.2 Word and Phrase Thesaurus Encarta has well-organzed manual ndexes n addton to automatcally extracted ndex terms. In order to test our technque n a general context, we do not use the manual ndexes and the exstng Encarta search engne whch explots t for our evaluaton. Instead, we mplement a vector space model as the baselne method n our experments. We do not use tradtonal methods to extract phrases from documents because we are more nterested n the phrases n the query space. Therefore, we extract all sequences of N-grams, where N s the number of nontrval terms n a query, from the user logs wth occurrences hgher than 5. These N-grams are treated as canddate phrases. Then, we locate the canddate phrases n the document corpus and flter out those not appearng n the documents. In the end, we get a thesaurus contanng over 13,000 phrases, whch are used as addtonal ndexes. When usng phrases and sngle words together, our system always gves prorty to phrases. 5.3 Evaluaton Methodology In order to evaluate our log-based query expanson method, we wll compare ts performance not only wth that of the orgnal queres, but also wth that of local context analyss. We employ nterpolated 11-pont average precson as the man metrc of performance. Statstcal t-test [18] s used to ndcate whether an mprovement s statstcally sgnfcant. A p-value less than 0.05 s deemed sgnfcant.

8 8 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 TABLE 2 A Comparson of Retreval Performance n Average Precson (%) for Long Queres between Baselne, Local Context Analyss (LC Exp), and Log-Based Query Expanson (On Log Exp) TABLE 3 A Comparson of Retreval Performance n Average Precson (%) for Short Queres between Baselne, Local Context Analyss (LC Exp), and Log-Based Query Expanson (On Log Exp) Terms are weghted usng TF*IDF measure n our retreval system. Both the orgnal and the expanded queres are evaluated by the same retreval system, makng t possble to compare the effects of query expanson. For local context analyss, we use 30 expanson terms (ncludng words and phrases) from 100 top-ranked documents for query expanson. The smoothng factor n local context analyss s set to 0.1, as suggested by [38]. For the log-based query expanson, we use 40 expanson terms. We notce that the occurrences of phrases are far less than those of words. Ths creates an unbalance between the weghts we assgned to word correlatons and to phrase correlatons. In order to create a better balance, the probablty assocated wth a phrase correlaton s multpled by a factor S because phrases are less ambguous than words (S s set to 10 n our experments). The formula used to measure phrase correlatons s modfed from (7) to the followng one: PðT ðdþ T ðqþ Þ¼ X 8D k 2S PðT ðdþ D k Þ S fðqþ k ðt ðqþ f ðqþ ðt ðqþ! ;D k Þ ; ð9þ Þ where T ðdþ and T ðqþ are, respectvely, a document phrase and a query phrase. In addton, the above results of (7) and (9) should be dvded by the sum of all P ðw ðdþ w ðqþ Þ and PðT ðdþ T ðqþ Þ n order to satsfy the requrement of the probablstc framework. 6 EXPERIMENTAL RESULTS 6.1 Performance Comparson We now present the expermental results of the local context analyss and the log-based query expanson method. Results wth the orgnal queres wthout expanson are used as the baselne. All the experments are carred out wth both words and phrases. The results wth the long queres and the short queres are presented, respectvely, n Table 2 and Table 3. We see that our log-based query expanson performs very well on both query sets. On the long query set, the logbased query expanson method brngs an average mprovement of percent n precson (p-value = ) over the baselne, whle the local context analyss acheves an average mprovement of 6.56 percent n precson (p-value = 0.33) over the baselne. The p-value suggests that the logbased query expanson gans a statstcally sgnfcant mprovement over the orgnal queres. It s to be noted that the log-based query expanson also provdes an average mprovement of percent compared to local context analyss, whch s also statstcally sgnfcant (p-value = ). In general, we observe that log-based query expanson selects more accurate expanson terms than local context analyss due to the explotaton of user udgments. In contrast, local context analyss searches expanson terms n the top-ranked retreved documents and s more lkely to add some rrelevant terms nto the orgnal query, thus ntroducng some undesrable sdeeffects on retreval performance. The results shown n Table 3 advocate our conecture that our query expanson approach s even more useful for short queres than for long queres. There s a dramatc change n the performances of both local context analyss and the log-based query expanson method when short queres are expanded. The log-based query expanson offers an average mprovement of percent (and maxmum mprovement of percent) n comparson wth the orgnal queres. The p-value for ths augment s whch ndcates ts statstcal sgnfcance. Local context analyss boosts the average precson to percent, whch s percent better than the baselne (p-value = 0.018) (compared to only 6.56 percent mprovement ganed on the long query set). All these results suggest that query expanson s of extreme mportance for short queres. Accordng to our observaton that less than two words are used n most user queres n the Encarta logs, we come to the concluson that query expanson may mprove the effectveness of search engnes whch deals wth short queres. It s nterestng to compare the results of the query expanson technques on both query sets. Wth the local context analyss, the results wth the long queres are slghtly better than those wth short queres, wth an mprovement of 1.33 percent. However, the results obtaned by the log-based query expanson on the long queres are 3.64 percent worse than ther counterparts for short queres.

9 CUI ET AL.: QUERY EXPANSION BY MINING USER LOGS 9 TABLE 4 Comparson of Average Precson (%) obtaned by Log-Based Query Expanson wth Phrases (Phrase) and wthout Phrases (No Phrase) on the Long Query Set TABLE 5 Comparson of Average Precson (%) obtaned by Log-Based Query Expanson wth Phrases (Phrase) and wthout Phrases (No Phrase) on the Short Query Set Ths may suggest that our method can select expanson terms for short queres that are even better than those used n the long queres to descrbe the nformaton needs. Globally, wth query expanson, the performances for short and long queres are smlar. Ths confrms that query expanson s an effectve way to reduce the dfference between short and long queres. 6.2 Impact of Phrases Experments on noun phrases n [38] showed that the local context analyss can acheve a small mprovement wth phrases. However, they only tested t wth long queres. We beleve that ths mpact can be even larger for short queres. In fact, even f a word-based representaton s not precse, n a long query, ths mprecson s compensated by the large number of words n the query. The whole set of query words together may gve a qute precse descrpton of the nformaton need. However, ths s not the case for short queres. For short queres, the user s ntenton can be expressed more accurately wth phrases because phrases are nherently less ambguous than sngle words. We conduct experments of the log-based query expanson wth and wthout phrases. The results are shown n Table 4 and Table 5. The results confrm our expectaton ust descrbed. The mprovement ganed wth phrases on the short queres s almost twce of that obtaned wth phrases on the long queres. Smlar to the retreval process, query expanson s also affected by the ambguty of the terms n orgnal queres. The use of phrases can help reduce the ambguty of query terms, thus allow query expanson to extract more relevant expanson terms. For example, for the short verson of the query #8 Sx Day War (see Table 1), each word s common and appears n many documents rrelevant to ths query. If t s parsed as three sngle words, many rrelevant documents wll be found. However, when t s presented as a phrase, the concept represented by t becomes unambguous and t can match less rrelevant documents; so, the retreval effectveness can be mproved. In comparson, gven the long verson of ths query, f the three words are supplemented by the words Israel and Arab, then they descrbe together a more precse meanng, leadng to more relevant documents. So, even though phases are not recognzed n a long query, the mpact s less dramatc than for a short query. Our other results (that are not lsted here) show that, f we use phrases n the baselne method, the performance of ths latter can also be mproved by 2.35 percent and 8.95 percent, respectvely, on the long and the short queres. Integratng phrases nto the local context analyss can acheve mprovements of 8.21 percent and percent for the long and short queres. These results are consstent wth those of the logbased query expanson. In summary, phrases are very mportant for searchng wth short queres. In addton, our method of phrase extracton from user logs, although smple, proved to be effectve. 6.3 Impact of Number of Expanson Terms In general, the number of expanson terms should be wthn a reasonable range n order to produce consstently good performance. Too many expanson terms not only consume Fg. 3. Impact of number of expanson terms.

10 10 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 more tme for the retreval process, but also have sdeeffects on the retreval performance. We examne the performance of the log-based query expanson by usng 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 expanson terms on the two query sets. The results are shown n Fg. 3. The best performances are obtaned wth around 30 expanson terms for both query sets. It s worth notng that the curve produced on the long query set s flatter than the other one. The curve of the long query set reaches ts summt at 30 expanson terms and remans very flat after 30. In comparson, the curve of the short query set drops after addng more than 60 expanson terms. We attrbute ths to the fact that the short queres have less orgnal terms, whch, when expanded excessvely wthout other terms to serve as context, may produce more sde-effects and generate more rrelevant terms. For long queres, as more terms act together n the selecton of expanson terms, the chance of generatng many rrelevant terms s much less. 7 CONCLUSIONS The prolferaton of the World Wde Web prompts the wde applcaton of search engnes. However, short queres and the ncompatblty between the terms n user queres and documents strongly affect the performance of the exstng search engnes. Many automatc query expanson technques have been proposed, whch can solve the short query and the term msmatchng problem to some extent. However, they do not take advantage of the user logs avalable n varous Web stes, and use them as a means for query expanson. In ths artcle, we presented a novel method for automatc query expanson based on user logs. Ths method ams frst to establsh correlatons between query terms and document terms by explotng the user logs. These relatonshps are then used for query expanson. We have shown that ths s an effectve way to narrow the gap between the query space and the document space. For new queres, hgh-qualty expanson terms can be selected from the document space on the bass of the extracted correlatons. We tested ths method on a data set that s smlar to the real Web envronment. A seres of experments conducted on both long queres and short queres showed that the logbased query expanson method can acheve substantal mprovements n performance. It also outperforms local context analyss, whch s one of the most effectve query expanson methods n the past. Our experments also show that query expanson s more effectve for short queres than for long queres. REFERENCES [1] M.J. Bates, Search Technques. Ann. Rev. of Informaton Scence and Technology, M.E. Wllams, ed., pp , [2] D. Beeferman and A. Berger, Agglomeratve Clusterng of a Search Engne Query Log, Proc. SIGKDD, pp , [3] G. Brank, S. Mzzaro, and C. Tasso, Evaluatng User Interfaces to Informaton Retreval Systems: A Case Study on User Support, Proc. 19th Ann. Int l ACM SIGIR Conf. Research and Development n Informaton Retreval (SIGIR 96), pp , Aug [4] C. Buckley, G. Salton, and J. Allan, Automatc Retreval wth Localty Informaton Usng Smart, Proc. Frst Text Retreval Conf. (TREC-1), pp , [5] C. Buckley, M. Mtra, J. Walz, and C. Carde, Usng Clusterng and Superconcepts wthn Smart, Proc. Sxth Text Retreval Conf. (TREC-6), E. Voorhees, ed., pp , [6] C. Buckley, G. Salton, J. Allan, and A. Snghal, Automatc Query Expanson Usng SMART, Overvew of the Thrd Retreval Conf. (TREC-3), pp , Nov [7] C. Carpneto, G. Romano, and B. Bg, An Informaton-Theoretc Approach to Automatc Query Expanson, ACM Trans. Informaton Systems, vol. 19, no. 1, pp. 1-27, Jan [8] J.W. Cooper and R.J. Byrd, Lexcal Navgaton: Vsually Prompted Query Expanson and Refnement, Proc. Second ACM Int l Conf. Dgtal Lbrares, pp , [9] W.B. Croft, R. Cook, and D. Wlder, Provdng Government Informaton on the Internet: Experences wth THOMAS, Proc. Second Int l Conf. Theory and Practce of Dgtal Lbrares, pp , [10] C.J. Crouch and B. Yang, Experments n Automatc Statstcal Thesaurus Constructon, Proc. ACM-SIGIR Conf. Research and Development n Informaton Retreval, pp , [11] H. Cu, J.-R. Wen, J.-Y. Ne, and W.-Y. Ma, Probablstc Query Expanson Usng User Logs, Proc. 11th World Wde Web Conf., pp , [12] S. Deerwster, S.T. Duma, G.W. Furnas, T.K. Landauer, and R. Harshman, Indexng by Latent Semantc Analyss, J. Am. Soc. Informaton Scence and Technology, vol. 41, no. 6, pp , [13] E. Efthmads and P. Bron, UCLA-Okap at TREC-2: Query Expanson Experments, Proc. Second Text Retreval Conf. (TREC- 2), D.K. Harmon, ed., [14] D. Evans and R. Lefferts, Desgn and Evaluaton of the CLARIT- TREC-2 System, Proc. Second Text Retreval Conf. (TREC-2), [15] G.W. Furnas, T.K. Landauer, L.M. Gomez, and S.T. Dumas, THE Vocabulary Problem n Human-System Communcaton, Comm. ACM, vol. 30, no. 11, pp , [16] G. Grefenstette, Exploratons n Automatc Thesaurus Dscovery. Kluwer Academc Publshers, [17] S.P. Harter, Onlne Informaton Retreval: Concepts, Prncples, and Technques. Orlando, Fla.: Academc Press, [18] D. Hull, Usng Statstcal Testng n the Evaluaton of Retreval Experments, Proc. ACM SIGIR, pp , June [19] Y. Jng and W.B. Croft, An Assocaton Thesaurus for Informaton Retreval, Proc. RIAO, pp , [20] M.E. Lesk, Word-Word Assocatons In Document Retreval Systems, Am. Documentaton, vol. 20, no. 1, pp , [21] A. Lu, M. Ayoub, and J. Dong, Ad Hoc Experments Usng EUREKA, Proc. Text Retreval Conf. (TREC-5), pp , [22] G. Mller, Wordnet: An Onlne Lexcal Database, Int l J. Lexcography, vol. 3, no. 4, [23] M. Mtra, A. Snghal, and C. Buckley, Improvng Automatc Query Expanson, Proc. 21st Ann. Int l ACM SIGIR Conf. Research and Development n Informaton Retreval, pp , [24] Y. Qu and H. Fre, Concept Based Query Expanson, Proc. 16th Int l ACM SIGIR Conf. R & D n Informaton Retreval, pp , [25] S.E. Robertson and K. Sparck Jones, Relevance Weghtng of Search Terms, J. Am. Soc. for Informaton Scences, vol. 27, no. 3, pp , [26] S.E. Robertson, S. Walker, and M. Sparck Jones, et al., Okap at TREC-3, Proc. Second Text Retreval Conf. (TREC-3), [27] J. Roccho, Relevance Feedback n Informaton Retreval, The Smart Retreval System Experments n Automatc Document Processng, G. Salton, ed., pp , [28] R. Baeza-Yates and B. Rbero-Neto, Modern Informaton Retreval. England: Pearson Educaton Lmted, [29] T. Saka, S.E. Robertson, and S. Walker, Flexble Pseudo- Relevance Feedback Va Drect Mappng and Categorzaton of Search Requests, Proc. BCS-IRSG ECIR, pp. 3-14, [30] G. Salton, The SMART Retreval System Experments n Automatc Document Processng. Englewood Clffs, N.J.: Prentce Hall, [31] G. Salton and C. Buckley, Improvng Retreval Performance by Relevance Feedback, J. Am. Soc. for Informaton Scence, vol. 41, no. 4, pp , [32] K. Sparck Jones, Automatc Keyword Classfcaton for Informaton Retreval. London: Butterworths, 1971.

11 CUI ET AL.: QUERY EXPANSION BY MINING USER LOGS 11 [33] E.M. Voorhees, Query Expanson Usng Lexcal-Semantc Relatons, Proc. 17th Int l Conf. Research and Development n Informaton Retreval, pp , [34] J.-R. Wen, J.-Y. Ne, and H.-J. Zhang, Query Clusterng Usng User Logs, ACM Trans. Informaton Systems, vol. 20, no. 1, pp , [35] S.K. Wong and W. Zarko et al., On Modelng of Informaton Retreval Concepts n Vector Spaces, ACM Trans. Database Systems, vol. 12, no. 2, pp , June [36] S.K.M. Wong and Y.Y. Yao, A Probablstc Method for Computng Term-by-Term Relatonshps, J. Am. Soc. for Informaton Scence, vol. 44, no. 8, pp , [37] J. Xu and W.B. Croft, Query Expanson Usng Local and Global Document Analyss, Proc. 19th Int l Conf. Research and Development n Informaton Retreval, pp. 4-11, [38] J. Xu and W.B. Croft, Improvng the Effectveness of Informaton Retreval wth Local Context Analyss, ACM Trans. Informaton Systems, vol. 18, no. 1, pp , Jan Hang Cu receved the BS and MS degrees n management nformaton systems from Tann Unversty, Tann, Chna, n 2000 and 2002, respectvely. In July 2002, he was admtted nto the Natonal Unversty of Sngapore, where he s pursung a PhD degree. In 2001 and 2002, he spent one year workng as a vstng student at Mcrosoft Research Asa, Beng, Chna. Hs research nterests nclude text mnng, ntellgent nformaton retreval, machne learnng, and Q & A systems. J-Rong Wen receved the BS and MS degrees n 1994 and 1996, both from School of Informaton, Renmn Unversty of Chna. He receved the PhD degree n 1999 from the Insttute of Computng Technology, the Chnese Academy of Scence. He oned Mcrosoft Research Asa n July 1999 and s currenly a researcher n the Meda Management Group. Hs man research nterests are data management, ntellgent nformaton retreval, and Web mnng. Jan-Yun Ne receved the PhD degree n 1990 from the Unversté Joseph Fourer of Grenoble, France. He s an assocate professor n Département d nformatque et Recherché Opératonnelle, Unversté de Montréal. Hs research nterests are focused on nformaton retreval (IR), n partcular, cross-language and multlngual IR, knowledge- and NLP-based IR, as well as theoretcal aspects of IR such as logcal models of IR. He s also nterested n data mnng. We-Yng Ma receved the BS degree n electrcal engneerng from the Natonal Tsng Hua Unversty n Tawan n 1990, and the MS and PhD degrees n electrcal and computer engneerng from the Unversty of Calforna at Santa Barbara n 1994 and 1997, respectvely. He oned Mcrosoft Research Asa n Aprl 2001 as the research manager of the Meda Management Group. Pror to onng Mcrosoft, he was wth Hewlett-Packard Laboratores n Palo Alto, Calforna, where he was a researcher n the Internet Moble and Systems Lab. From 1994 to 1997, he was engaged n the Alexandra Dgtal Lbrary (ADL) proect at the Unversty of Calforna at Santa Barbara whle completng hs PhD degree. Dr. Ma serves as an assocate edtor for the Journal of Multmeda Tools and Applcatons publshed by Kluwer Academc Publshers. He has served on the organzng and program commttees of many nternatonal conferences and has publshed four book chapters. Hs research nterests nclude mage and vdeo analyss, content-based mage and vdeo search and retreval, machne learnng technques, ntellgent nformaton systems, adaptve content delvery, content dstrbuton and servces networks, and meda delvery and cachng. He s a member of the IEEE.. For more nformaton on ths or any computng topc, please vst our Dgtal Lbrary at

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

A new query expansion method based on query logs mining1

A new query expansion method based on query logs mining1 Internatonal Journal on Asan Language Processng, 19 (1): 1-12 1 A new query expanson method based on query logs mnng1 Zhu Kunpeng, Wang Xaolong, Lu Yuanchao School of Computer Scence and Technology, Harbn

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Using Query Contexts in Information Retrieval Jing Bai 1, Jian-Yun Nie 1, Hugues Bouchard 2, Guihong Cao 1 1 Department IRO, University of Montreal

Using Query Contexts in Information Retrieval Jing Bai 1, Jian-Yun Nie 1, Hugues Bouchard 2, Guihong Cao 1 1 Department IRO, University of Montreal Usng uery Contexts n Informaton Retreval Jng Ba 1, Jan-Yun Ne 1, Hugues Bouchard 2, Guhong Cao 1 1 epartment IRO, Unversty of Montreal CP. 6128, succursale Centre-vlle, Montreal, uebec, H3C 3J7, Canada

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

An Iterative Implicit Feedback Approach to Personalized Search

An Iterative Implicit Feedback Approach to Personalized Search An Iteratve Implct Feedback Approach to Personalzed Search Yuanhua Lv 1, Le Sun 2, Junln Zhang 2, Jan-Yun Ne 3, Wan Chen 4, and We Zhang 2 1, 2 Insttute of Software, Chnese Academy of Scences, Beng, 100080,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment Cross-lngual Pseudo Relevance Feedback Based on Weak Relevant opc Algnment WANG Xu-wen Insttute of Medcal Informaton & Lbrary, Chnese Academy of Medcal Scences, Beng 100020 wang.xuwen@mcams.ac.cn ZHANG

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Relevance Feedback for Image Retrieval

Relevance Feedback for Image Retrieval Vashal D Dhale et al, / (IJCSIT Internatonal Journal of Computer Scence and Informaton Technologes, Vol 4 (2, 203, 39-323 Relevance Feedback for Image Retreval Vashal D Dhale, Dr A R Mahaan, Prof Uma Thakur

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Recommendations of Personal Web Pages Based on User Navigational Patterns

Recommendations of Personal Web Pages Based on User Navigational Patterns nternatonal Journal of Machne Learnng and Computng, Vol. 4, No. 4, August 2014 Recommendatons of Personal Web Pages Based on User Navgatonal Patterns Yn-Fu Huang and Ja-ang Jhang Abstract n ths paper,

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines A Novel Optmzaton Technque for Translaton Retreval n Networks Search Engnes Yanyan Zhang Zhengzhou Unversty of Industral Technology, Henan, Chna Abstract - Ths paper studes models of Translaton Retreval.e.

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Semantic Illustration Retrieval for Very Large Data Set

Semantic Illustration Retrieval for Very Large Data Set Semantc Illustraton Retreval for Very Large Data Set Song Ka, Huang Te-Jun, Tan Yong-Hong Dgtal Meda Lab, Insttute of Computng Technology, Chnese Academy of Scences Beng, 00080, R Chna Insttute for Dgtal

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Web-supported Matching and Classification of Business Opportunities

Web-supported Matching and Classification of Business Opportunities Web-supported Matchng and Classfcaton of Busness Opportuntes. DIRO Unversté de Montréal C.P. 628, succursale Centre-vlle Montréal, Québec, H3C 3J7, Canada Jng Ba, Franços Parads,2, Jan-Yun Ne {bajng, paradfr,

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Ranking Search Results by Web Quality Dimensions

Ranking Search Results by Web Quality Dimensions Rankng Search Results by Web Qualty Dmensons Joshua C. C. Pun Department of Computer Scence HKUST Clear Water Bay, Kowloon Hong Kong punjcc@cs.ust.hk Frederck H. Lochovsky Department of Computer Scence

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Ontology Mapping: As a Binary Classification Problem

Ontology Mapping: As a Binary Classification Problem Fourth Internatonal Conference on Semantcs, Knowledge and Grd Ontology Mappng: As a Bnary Classfcaton Problem Mng Mao SAP Research mng.mao@sap.com Yefe Peng Yahoo! ypeng@yahoo-nc.com Mchael Sprng U. of

More information