Document Filtering Method Using Non-Relevant Information Profile
|
|
- Victor Wilkinson
- 6 years ago
- Views:
Transcription
1 Document Filtering Method Using Non-Relevant Information Profile Keiichiro Hoashi Kazunori Matsumoto Naomi Inoue Kazuo Hashimoto KDD R&D Laboratories, Inc Ohaxa Kamifukuoka, Saitama JAPAN {hoash±, matsu, inoue, kh}~kddlabs, co. j p Abstract Document filtering is a task to retrieve documents relevant to a user's profile from a flow of documents. Generally, filtering systems calculate the similarity between the profile and each incoming document, and retrieve documents with similarity higher than a threshold. However, many systems set a relatively high threshold to reduce retrieval of non-relevant documents, which results in the ignorance of many relevant documents. In this paper, we propose the use of a non-relevant information profile to reduce the mistaken retrieval of non-relevant documents. Results from experiments show that this filter has successfully rejected a sufficient number of non-relevant documents, resulting in an improvement of filtering performance. 1 Introduction Document filtering is a task which monitors a flow of incoming documents, and selects those which the systems regards as relevant to the user's interest. Many document filtering systems use a similarity-based method to retrieve documents. The user's interest is expressed within the system as a profile. The similarity between the profile and each incoming document is calculated, and documents with similarities higher than a preset threshold are retrieved. Retrieved documents are sent to the user, who returns a relevance feedback to the system. This feedback information is used to update the profile for the upcoming flow of new documents. Due to its similarity to the traditional information retrieval (IR) task, many techniques developed in IR are applied to document filtering systems. For example, profiles and incoming documents are usually indexed Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distnbuted for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, to repubhsh, to post on servers or to red,stnbute to hsts, requires prior specific permission and/or a fee. SIGIR /00 Athens, Greece 2000 ACM /00/0007.., $ 5.00 by methods used in IR, such as the vector space model. Therefore, the profile-document similarity calculation method is virtually the same as algorithms used for calculating query-document similarity in IR. Furthermore, query expansion (QE) is often applied to utilize relevance feedback information to profile updating. Numerous document filtering systems have been reported in the Filtering Track[3] of recent TREC conferences. One of the 3 subtasks prepared in the Filtering Track is the adaptive filtering task, where systems start with only the original profile, which is used to build a text classification rule. The adaptive filtering task is considered to be the task which reflects practical illtering situations, but is also the most difficult task in the Filtering Track. Due to the difficulty of this task, many systems become "conservative" as the document flow proceeds, i.e., tend to set a high threshold to avoid mistaken retrieval of non-relevant documents. In other words, a threshold which will retrieve a sufficient number of relevant documents results in the excessive retrieval of non-relevant documents. This suggests that the simple method of comparing profile-document similarity to a threshold may not be effective, especially when the system is expected to retrieve as many relevant documents as possible. In this paper, a novel filtering method is proposed. The proposed method uses a profile which expresses information of non-relevant documents retrieved during the filtering process. This non-relevant information profile can reduce the number of mistakenly retrieved documents, so that the system can retrieve more relevant documents which may be ignored due to a conservative similarity threshold. In Section 2, we will describe existing filtering methods, mainly focusing on profile updating, and present problems of these methods through preliminary experiments. In Section 3, we will make an explanation of the non-relevant information profile, and evaluate its performance through experiments. In Section 4, we will describe another new method which applies pseudo feedback to increase feedback information to the non- 176
2 relevant information profile. We will conclude this paper in Section 5. 2 Problems of existing filtering methods As described in the previous section, the filtering method in which the system retrieves documents based on the similarity between the profile and each incoming document is suspected to have problems. However, most research have been focused on aspects such as profile updating to improve filtering performance. In this section, we will explain about two existing profile updating methods. We will also describe a preliminary experiment on these filtering methods, and analyze its results to clarify problems. 2.1 Existing profile updating methods Rocchlo's algorithm One of the most effective and widely applied algorithms of relevance feedback and query expansion is Rocchio's algorithm [5], which was developed in the mid-1960's. Developed for the vector space model, this algorithm is based on the idea that if the relevance for a query is known, an optimal query vector will maximize the average query-document similarity for relevant documents, and will simultaneously minimize query-document similarity for non-relevant documents. Generally, the query expansion method based on Rocchio's algorithm is expressed in the following formula: where R is the number of documents in the relevant document set, N is the number of documents in the non-relevant document set, and o~,fl, 7 are parameters. For the use of Rocchio's algorithm in profile updating, we referred to the method described in [8]. In this method, only positive documents (i.e., selected relevant documents) were used for profile updating. The coefficient for positive documents is fixed to 0.1 (meaning the parameters in Formula (1) are set as: c~ = 1,fl = 0.1,7 -= 0). The profile is updated based on these parameter sets on every n selected documents. In the experiment described below, we set n to Word contribution We have also evaluated a profile updating method based on word contribution (WC)[2], which is a measure to express the influence of a word to query-document similarity. We will describe the WC-based QE method and its application to profile updating in this section. WC-based QE Word contribution is defined by the following formula: Cont(w, q, d) = Sire(q, d) - Sim(q'(w), d'(w)) (2) where Cont(w, q, d) is the contribution of the word w in the similarity between query q and document d, Sire(q, d) is the similarity between q and d, q'(w) is query q excluding word w, and d'(w) is document d excluding word w. In other words, the contribution of word w is the difference between the similarity of q and d, and the similarity of q and d when word w is assumed to be nonexistent in both data. Therefore, there are words which have positive contribution, and words which have negative contribution. Words with positive contribution raise similarity, and words with negative contribution lower similarity. Analysis on WC[2] show that words with either highly positive or negative contribution are few, and that most words have contribution near zero. This means that most words do not have a significant influence on querydocument similarity. As obvious from the definition of word contribution, words with highly positive contribution are words which cooccur in the query and document. Such words can be considered as informative words of document relevance to the query. On the contrary, words with highly negative contribution can be considered as words which discriminate relevant documents from other non-relevant documents contained in the data collection. Experiments reported in [2] show that using such words with highly negative contribution for query expansion achieved higher performance than the Rocchio-based query expansion method. WC-based profile updating In the previously described QE method, words used for query expansion were extracted only from relevant documents. In the profile updating method based on WC[1], information from all selected documents were used, regardless of their relevance to the profile. First, the word contribution of all words in the selected document are calculated. From each selected document d, N words with the lowest contribution are extracted. Next, a score for each extracted word w is cmculated by the following formula: Score(w) : wgt x Cont(w, p, d) (3) where wgt is a parameter with a negative value (since the contribution of the extracted word is also negative), and Cont(w, p, d) is the WC of word w to the similarity of profile p and document d. On this procedure, the calculated score is regarded as the TF (term frequency) element of the word. Finally, all extracted words and their weights are added to the profile, unless the calculated weight of the word is negative. 177
3 A Rocchio-like algorithm is applied here to apply information from non-relevant documents to the profile. When the selected document d is relevant to the profile, the weight of word w is added to the element of the profile vector which expresses w. When d is nonrelevant, the weight is subtracted from the element of the profile vector. Seperate parameters (wgt) are used for the calculation of Score(w) described in Formula (3), depending on the relevance of d. wgt~tn is the parameter for words extracted from relevant documents, and wgt,~tn is the parameter for words extracted from non-relevant documents. Elements of the profile vector with negative weights are not used for similarity calculation, but all weights are accumulated for profile updating on upcoming documents. Therefore, the weights of words which appear in both relevant and non-relevant documents are restrained, thus emphasizing words which only appear in relevant documents. where tf is the term's frequency in the document, df is the number of documents that contain the term, and M is the total number of documents in the data collection. We added 1 to the term frequency inside the logarithm of the TF factor because the tf value resulting from word contribution occasionally has values below 1, which results in a negative weight Analysis Figures 1 and 2 illustrate the similarity of documents selected by comparison to a profile, using profile updating methods based on Rocchio and WC, respectively. The horizontal axis of each graph expresses the number of documents selected during the filtering process, and the vertic~d axis is the similarity of each selected document. Parameters WgtreIR and WgtnreIR for the WC-based method were set to -200 and -800, respectively. 2.2 Evaluation of existing methods We have made experiments to evaluate existing filtering methods. In this section, we will make a brief explanation on the data used for experiments and our filtering system, and present experiment results Experiment data The TREC-8 Filtering Track data[7] was used for our experiments. This data set consists of articles from the Financial Times from 1992 to 1994, which is a total of approximately 200,000 articles. Each article is inputted into the filtering system in order of time to create the document flow. Topics are used as profiles, and the relevance document data set for each topic is used to simulate relevance feedback. The vocabulary and IDF data are initially constructed from the data in TREC CD-ROMs Vol 4 and 5, excluding the F, nanczal Tzmes and Congresswnal Records data. Both the vocabulary and IDF data are updated every documents in the document flow System description The filtering system used for our experiments is based on the vector space model. The weighting calculation scheme is based on the TF*IDF based weighting formulas for the SMART system at TREC-7 [6], with minor customizations. The TF and IDF factors for our system are as the following: TF factor * IDF factor log(1 + (4) ~03 05 O4 ~E02 01 o0 f o " o ;,3 l oo o o m o R m o o : : :" 1 ":.,~ =l ~-".'., " " "'" 2o~:~,,~. a ~''~" -~-..'-'.".~ ~'@ "~ /.':.l # of ret tiered clocs Figure 1: Similarity of selected documents (Rocchio) ~ E ~ ~rne~ :;rae~ vant o o = = ~ o c~o=, =o = o 69 0 = " "~o '.4" ". ~ 5,,.f,o'o. "-- " ^ "',..." %.. = ~...o.pl~pr.,,i.~.'1, ~_~:d"- ~'~, ",. =.~,, _~.=.gao,'.~,~..,o - ~ -~=~o--,~---~---"k-,,aj i # O! retrieved does Figure 2: Similarity of selected documents (WC) It is clear from both Figures 1 and 2 that Mthough relevant documents have relatively high similarity, many non-relevant documents have similarity close to the similarity of relevant documents. The mixture of relevant and non-relevant documents can particularly be observed o 178
4 in low similarity areas. Therefore, it is difficult to extract relevant documents from this area without retrieving a large number of non-relevant documents. The easy way to solve this problem is to set a high similarity threshold to reject as many non-relevant documents as possible. However, it is obvious that a high threshold will result in the rejection of a large number of relevant documents. Moreover, such a strict threshold will also result in less feedback to the profile, which may affect filtering performance on upcoming documents. 3 Non-relevant information profile In this section, we will propose a filtering method using the non-relevant,nformatwn profile, which is a profile built to reject non-relevant documents. After the description of this method, we will also make a detailed explanation on evaluation experiments of the proposed method, and analyze the results. 3.1 Method To improve filtering performance without sacrificing retrieval of relevant documents, it is necessary to reduce non-relevant document selection. However; the analysis on results of the experiments described in the previous section show that this is difficult when filtering is based on only the similarity between the profile and incoming documents. In order to reduce retrieval of non-relevant documents, we propose the use of a profile which expresses the features of non-relevant documents. By calculating the similarity between this non-relevant znformatwn profile and incoming documents which have passed the initial profile, and rejecting documents which have high similarity to the non-relevant information profile, it is possible to avoid selection of documents highly similar to past retrieved non-relevant documents. By rejecting such documents, improvement of filtering performance can be expected. The process flow of filtering with the non-relevant information profile is illustrated in Figure 3, where d is the selected document, pr is the initial profile, PN is the non-relevant information profile, and Sirn(p, d) is the similarity between profile p and document d. As illustrated in Figure 3, thresholds ThresR and Thresg are set for each profile. The similarity between PN and documents which have passed PR is calculated, and compared to Thresg. If the similarity exceeds ThresN, then the document is regarded as non-relevant, and, as a result, is rejected by PN. The method to build the non-relevant information profile is as the following: Initial values of all elements in the non-relevaaat information profile are set to 0. For each selected document, N words are extracted and their weights are No Get relevance feedback of d J I Update PR, PN ] No Figure 3: Filtering process with non-relevant information profile calculated based on WC. As in the original WC-based profile updating method, parameter wgt differs based on the relevance of the selected document. For the generation and updating of ply, WgtretN is the parameter for words extracted from relevant documents, and wgtnrelg is the parameter for words extracted from nonrelevant documents. To update the non-relevaat information profile, the weights of words extracted from nonrelevant documents are added, and weights of words extracted from relevant documents are subtracted from the regarding element of the profile vector. This is opposite from the updating of the initial profile, where the weights of words extracted from relevant documents were added to the regarding element of the profile vector, and the weights of words extracted from non-relevant documents were subtracted. In addition to the updating of the non-relevant information profile, the initial profile PR is also updated by the method described in Section Experiment We have made experiments to evaluate the use of the non-relevant information profile. Details of these experiments are described in this section Evaluation measures Since recall and precision are not suitable for the evaluation of document filtering, we calculated the scaled utihty [3] of each profile, and averaged the scaled utility 179
5 of all profiles for evaluation. We will make an explanation about utility and scaled utility in this section. Utility[3] assigns a value or a cost to each document, based on whether it is retrieved or not retrieved and whether it is relevant or not relevant. The general formula for utility is shown in the formula below: Utility = A x R+ + B x N+ + C R_ + D x N_ (6) where R+ is the number of relevant documents retrieved, /~_ is the number of relevant documents not retrieved, N+ is the number of non-relevant documents retrieved, and N_ is the number of non-relevant documents not retrieved. The utility parameters (A, B,C,D) determine the relative value of each possible category. For evaluation of the results of the experiments in this paper, we used the LF1 utility used in TREC-8, where the parameters were set as the following: A = 3, B = -2, C = D = 0. However, it is not appropriate to compare the value of LF1 across topics, due to the wide variation in the number of relevant documents per topic. Therefore, it is necessary to normalize LF1 for fair comparison. We used scaled utility for the normalization of LF1. The formula of scaled utility is as the following: u, (S, T) = max(u(s, T), U(s)) - U(s) MaxU(T) - U(s) (7) where u(s,t) and us(s,t) are the original and scaled utility of system S for topic T, U(s) is the utility of retrieving s non-relevant documents, and MaxU(T) is the maximum possible utility score for topic T. All utility scores less than U(s) are set to U(s). Therefore, utility scores can range between U(s) and MaxU(T), and the scores are renormalized to range between 0 and Results First, we made experiments using only the relevant information profile (PR) for filtering. Parameters wgtretr and wgtnrezr were each set to {-200,-400,-800} and {-100,-200,-400,-800}, respectively. The similarity threshold (ThresR) was fixed to 0.1. The average scaled utility of all 50 topics for each parameter set is shown in Table 1. Parameter s for the calculation of scaled utility is set to 200. Results in Table 1 show that the parameter set of {wgtreir, wgt,~reir} = {--200,--800} achieved the best performance. Next, we evaluated performance of filtering using the non-relevant information profile (PN). The parameters used for updating PR were fixed to {wgt, ezr, wgt,~elr} = {--200,--800}, based on the results in Table 1. ThresR Table 1: Average scaled utility (pr only) WgtnrelR WgtrelR was fixed to 0.1, as in the previous experiment. Parameters for updating PN, WgtrelN and wgtnreig, were each set to {-200,-400, -800} and {-i00, -200, -400,-800} respectively. The similarity threshold ThresN was set to 0.I and Results for each ThresN are shown in Tables 2 (ThresN=O.1) and 3 (ThresN=0.25). Table 2: Average scaled utility(thresn = 0.1) wg~nreln WgtrelN -i Table 3: Average scaled utility(thresn = 0.25) wgtnreln wgtrein Consistent improvement in scaled utility compared to the original filtering method can be observed from the results in Tables 2 and 3. This shows that the application of the non-relevant information profile has contributed to the improvement of filtering performance. 3.3 Analysis For further analysis on the effects of Thresh, we examined the relation between the similarity of each document and the two profiles, Pn and pp. We will refer to the similarity to each of these profiles as Sirnn and SireN, respectively. In order to analyze the relation between Sirnn and Siren of relevant and non-relevant documents, we have plotted all documents which have passed PR on a twodimensional graph. The SitaR-SireN graph for the experiment when ThresN = 0.25 is illustrated in Figure 180
6 4, and the graph for Threslv = 0.1 is shown in Figure documents are rejected by the non-relevant profile O0 ~1~-2.1q~-0,,9. [ ocd o.~o oo o ] Sitar Figure 4: Relation of Sirnn and SireN (ThresN=0.25) 05. "~~'k.= o4 it. ~, 0 3 o == E~ "1o. Ioo, O2 o non-relevant o relevant 4 Non-relevant profile with pseudo feedback 4.1 Method Results from the experiments described in the previous section show that there is a tradeoff between the strictness of Thres~r and the performance of profile PN. To solve this problem, we propose the use of pseudo feedback[4] to increase feedback information. Pseudo feedback is often used for QE in the text retrieval task, when the relevance of retrieved documents is uncertain. Generally, documents which are high-ranked on the initial search are assumed to be relevant. This assumation is sent back to the system, which utilizes this information to expand the query. Our proposal is to assume documents that are blocked by PN as non-relevant, and to send this information to the profile updating process. The documents regarded as non-relevant by pseudo feedback are handled as the same as documents which were actually regarded non-relevant from the original relevance feedback. This method allows Thresy to be strict without sacrificing feedback information. 4.2 Experiment ~ ~:EP " ~= ': ~ o SlmR Figure 5: Relation of Sitar and SireN (ThresN=0.1) It is clear from Figure 4 that SirnN is relatively higher for non-relevant documents compared to that of relevant documents. This suggests that it is possible to reject many non-relevant documents by setting ThresN to an appropriate value. In this case, however, ThresN is As apparent from Figure 4, there are not many documents where SimN is higher than ThresN, meaning that such a threshold setting is too moderate. However, when ThresN is set to 0.1, as in Figure 5, SimN of relevant and non-relevant documents are mixed, compared to the plots illustrated in Figure 4. The difference between these two experiments is the strictness of ThresN. As a result of strengthening the threshold of the non-relevant information profile, the number of selected documents decreases. This decrease is directly correlated to the amount of feedback information to the profile updating process. The results illustrated in Figures 5 indicate that feedback information was insufficient for accurate discrimination of non-relevant documents. However, Figure 4 shows that the increase of feedback information due to loosening the threshold has little meaning, since less non-relevant Experiments were made to evaluate pseudo feedback. Parameters for these experiments were set as the following: Thresh = ThresN = 0.1, WgtreZn=-200, wgtn~etn=" 800, wg~rezn= {-200,-400,-800 }, WgtnreZg= {-lo0,-200,- 400,-800}. The average scaled utility for each set of wgtreln &lid wgtn~eln is shown in Table 4. Table 4: Average scaled utility (pseudo feedback) ~g~nreln wgtrezn The results in Table 4 show overall improvement in filtering performance. This points out that PN is successfully rejecting more non-relevant documents compared to the method described in the previous section. To confirm this result, we made a Siren-SireN graph for this experiment, as we did in Figures 4 and 5 for previous experiments. The Sirnn-Simg graph for the pseudo feedback experiment is illustrated in Figure 6. As clear from Figure 6, Simg of non-relevant documents are more highly distributed compared to the results illustrated in Figures 4 and 5. This graph and the 181
7 05 gill~=o 0 04 == ~ = ' ) 0 o =o o 01 u O~ qpl~"~ '/:P'= '~ SItaR non-relevant relevant Figure 6: Relation of Siren and SirnN (pseudo feedback) scaled utility improvement shown in Table 4 prove that the non-relevant information profile is successfully rejecting a reasonable number of non-relevant documents, as expected. However, it is also clear from Figure 6 that Siren of some relevant documents have also increased, causing a mixture of non-relevant and relevant documents in the area where SimN is relatively high. The cause of this is the inaccuracy of the pseudo feedback, in which it is possible for relevant documents to be mistakenly regarded as non-relevant. This shows that the decrease of non-relevant document selection was achieved with some sacrifice of relevant documents. We suggest two solutions to this problem. One is the selection of pseudo feedback information. Inaccuracy of pseudo feedback can be reduced by simply not using "suspicious" information for feedback. In this case, such information may be documents which were barely rejected by the non-relevant information profile. By ignoring such documents, and using only documents which have high similarity to the non-relevant information profile, the rate of erroneous feedback can be decreased. Another solution is to weigh the pseudo feedback information based on the similarity between each document and the non-relevant information profile. This is a moderate version of the previous solution. Instead of simply ignoring "suspicious" documents for use in pseudo feedback, it is possible to apply a weight to each document based on its similarity to the non-relevant information profile. An ideal weighting scheme will emphasize feedback information extracted from documents highly similar to the non-relevant information profile, which may lead to higher pseudo feedback quality. 5 Conclusion Many existing document filtering systems take a conservative approach to achieve high filtering performance;! to avoid retrieval of non-relevant documents, such systems sacrifice the retrieval of relevant documents. In order to retrieve more relevant documents without excessive retrieval of non-relevant documents, we have proposed the use of a non-relevant information profile. The non-relevant information profile expresses the features of mistakenly retrieved non-relevant documents. The object of this profile is to reject the retrieval of nonrelevant documents which are similar to documents mistakenly retrieved in the past flow of documents. Along with similarity calculation between each document and the original profile, the similarity to the non-relevant information profile is calculated, and documents with high similarity to this profile are rejected. Through experiments, we have proved that the nonrelevant information profile successfully reduces the retrieval of non-relevant documents, resulting in overall improvement of filtering performance. We have also made an experiment on the application of pseudo feedback for building the non-relevant information profile. Results from this experiment show that the increase of feedback information occurring from pseudo feedback has also improved filtering performance. References [1] K Hoashi, K Matsumoto, N Inoue, K Hashimoto: "Experiments on the TREC-8 Filtering Track", (to be published in The 8th Text REtrieval Conference"), [2] K Hoashi, K Matsumoto, N Inoue, K Hashimoto: "Query Expansion Method Based on Word Contribution", Proceedings of SIGIR'99, pp , [3] D Hull: "The TREC-7 Filtering Track: Description and Analysis", The 7th Text REtrieval Conference, NIST SP , pp 33-56, [4] S Robertson, S Walker, S Jones, M Hancock- Beaulieu, and M Gatford, "Okapi at TREC-3", Overview of the Third Text REtrieval Conference, pp , [5] J Rocchio: "Relevance Feedback in Information Retrieval", in "The SMART Retrieval System - Experiments in Automatic Document Processing", Prentice Hall Inc., pp , [6] A Singhal, J Choi, D Hindle, D Lewis, and F Pereira: "AT&T at TREC-7", The Seventh Text REtrieval Conference, NIST SP , pp , [7] E Voorhees, D Harman: "The 8th Text REtrieval Conference", (to be published),
8 [8] C Zhai, P Jansen, N Roma, E Stoiea, D Evans: "Notes on Optimization in CLARIT Adaptive Filtering", (to be published in The 8th Text REtmeval Conference"),
Document Expansion for Text-based Image Retrieval at CLEF 2009
Document Expansion for Text-based Image Retrieval at CLEF 2009 Jinming Min, Peter Wilkins, Johannes Leveling, and Gareth Jones Centre for Next Generation Localisation School of Computing, Dublin City University
More informationQuery Likelihood with Negative Query Generation
Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer
More informationAn Attempt to Identify Weakest and Strongest Queries
An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics
More informationQuery Expansion with the Minimum User Feedback by Transductive Learning
Query Expansion with the Minimum User Feedback by Transductive Learning Masayuki OKABE Information and Media Center Toyohashi University of Technology Aichi, 441-8580, Japan okabe@imc.tut.ac.jp Kyoji UMEMURA
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationAn Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments
An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments Hui Fang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Abstract In this paper, we report
More informationA Study of Methods for Negative Relevance Feedback
A Study of Methods for Negative Relevance Feedback Xuanhui Wang University of Illinois at Urbana-Champaign Urbana, IL 61801 xwang20@cs.uiuc.edu Hui Fang The Ohio State University Columbus, OH 43210 hfang@cse.ohiostate.edu
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationM erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l
M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l Anette Hulth, Lars Asker Dept, of Computer and Systems Sciences Stockholm University [hulthi asker]ø dsv.su.s e Jussi Karlgren Swedish
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationTerm Frequency Normalisation Tuning for BM25 and DFR Models
Term Frequency Normalisation Tuning for BM25 and DFR Models Ben He and Iadh Ounis Department of Computing Science University of Glasgow United Kingdom Abstract. The term frequency normalisation parameter
More informationUsing the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6
Using the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6 Masaki Murata National Institute of Information and Communications Technology 3-5 Hikaridai,
More informationSearch Engines Chapter 8 Evaluating Search Engines Felix Naumann
Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition
More informationCLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval
DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,
More informationPseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval
Pseudo-Relevance Feedback and Title Re-Ranking Chinese Inmation Retrieval Robert W.P. Luk Department of Computing The Hong Kong Polytechnic University Email: csrluk@comp.polyu.edu.hk K.F. Wong Dept. Systems
More informationFondazione Ugo Bordoni at TREC 2004
Fondazione Ugo Bordoni at TREC 2004 Giambattista Amati, Claudio Carpineto, and Giovanni Romano Fondazione Ugo Bordoni Rome Italy Abstract Our participation in TREC 2004 aims to extend and improve the use
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationUniversity of Santiago de Compostela at CLEF-IP09
University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es
More informationReal-time Query Expansion in Relevance Models
Real-time Query Expansion in Relevance Models Victor Lavrenko and James Allan Center for Intellignemt Information Retrieval Department of Computer Science 140 Governor s Drive University of Massachusetts
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationUsing Query History to Prune Query Results
Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu
More informationAT&T at TREC-6. Amit Singhal. AT&T Labs{Research. Abstract
AT&T at TREC-6 Amit Singhal AT&T Labs{Research singhal@research.att.com Abstract TREC-6 is AT&T's rst independent TREC participation. We are participating in the main tasks (adhoc, routing), the ltering
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationIITH at CLEF 2017: Finding Relevant Tweets for Cultural Events
IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events Sreekanth Madisetty and Maunendra Sankar Desarkar Department of CSE, IIT Hyderabad, Hyderabad, India {cs15resch11006, maunendra}@iith.ac.in
More informationRanking Function Optimizaton Based on OKAPI and K-Means
2016 International Conference on Mechanical, Control, Electric, Mechatronics, Information and Computer (MCEMIC 2016) ISBN: 978-1-60595-352-6 Ranking Function Optimizaton Based on OKAPI and K-Means Jun
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationImproving the Effectiveness of Information Retrieval with Local Context Analysis
Improving the Effectiveness of Information Retrieval with Local Context Analysis JINXI XU BBN Technologies and W. BRUCE CROFT University of Massachusetts Amherst Techniques for automatic query expansion
More informationTilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval
Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published
More informationMicrosoft Cambridge at TREC 13: Web and HARD tracks
Microsoft Cambridge at TREC 13: Web and HARD tracks Hugo Zaragoza Λ Nick Craswell y Michael Taylor z Suchi Saria x Stephen Robertson 1 Overview All our submissions from the Microsoft Research Cambridge
More informationCitation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness
UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:
More informationCombining fields for query expansion and adaptive query expansion
Information Processing and Management 43 (2007) 1294 1307 www.elsevier.com/locate/infoproman Combining fields for query expansion and adaptive query expansion Ben He *, Iadh Ounis Department of Computing
More informationTREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback
RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano
More informationAutomated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationMining the Web for Multimedia-based Enriching
Mining the Web for Multimedia-based Enriching Mathilde Sahuguet and Benoit Huet Eurecom, Sophia-Antipolis, France Abstract. As the amount of social media shared on the Internet grows increasingly, it becomes
More informationMelbourne University at the 2006 Terabyte Track
Melbourne University at the 2006 Terabyte Track Vo Ngoc Anh William Webber Alistair Moffat Department of Computer Science and Software Engineering The University of Melbourne Victoria 3010, Australia Abstract:
More informationBeyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval ChengXiang Zhai Computer Science Department University of Illinois at Urbana-Champaign William W. Cohen Center for Automated
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"
CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building
More informationRouting and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.
Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany
More informationMercure at trec6 2 IRIT/SIG. Campus Univ. Toulouse III. F Toulouse. fbougha,
Mercure at trec6 M. Boughanem 1 2 C. Soule-Dupuy 2 3 1 MSI Universite de Limoges 123, Av. Albert Thomas F-87060 Limoges 2 IRIT/SIG Campus Univ. Toulouse III 118, Route de Narbonne F-31062 Toulouse 3 CERISS
More informationUsing Coherence-based Measures to Predict Query Difficulty
Using Coherence-based Measures to Predict Query Difficulty Jiyin He, Martha Larson, and Maarten de Rijke ISLA, University of Amsterdam {jiyinhe,larson,mdr}@science.uva.nl Abstract. We investigate the potential
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationCustom IDF weights for boosting the relevancy of retrieved documents in textual retrieval
Annals of the University of Craiova, Mathematics and Computer Science Series Volume 44(2), 2017, Pages 238 248 ISSN: 1223-6934 Custom IDF weights for boosting the relevancy of retrieved documents in textual
More informationRe-ranking Documents Based on Query-Independent Document Specificity
Re-ranking Documents Based on Query-Independent Document Specificity Lei Zheng and Ingemar J. Cox Department of Computer Science University College London London, WC1E 6BT, United Kingdom lei.zheng@ucl.ac.uk,
More informationindexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa
Term Distillation in Patent Retrieval Hideo Itoh Hiroko Mano Yasushi Ogawa Software R&D Group, RICOH Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo 112-0002, JAPAN fhideo,mano,yogawag@src.ricoh.co.jp Abstract
More informationUsing a Medical Thesaurus to Predict Query Difficulty
Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query
More informationRMIT University at TREC 2006: Terabyte Track
RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction
More informationAn Incremental Approach to Efficient Pseudo-Relevance Feedback
An Incremental Approach to Efficient Pseudo-Relevance Feedback ABSTRACT Hao Wu Department of Electrical and Computer Engineering University of Delaware Newark, DE USA haow@udel.edu Pseudo-relevance feedback
More informationThe Utrecht Blend: Basic Ingredients for an XML Retrieval System
The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum
More informationTowards Privacy-Preserving Evaluation for Information Retrieval Models over Industry Data Sets
Towards Privacy-Preserving Evaluation for Information Retrieval Models over Industry Data Sets Peilin Yang 1, Mianwei Zhou 2, Yi Chang 3, Chengxiang Zhai 4, and Hui Fang 1 1 University of Delaware, USA
More informationA RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH
A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements
More informationQuery Expansion for Noisy Legal Documents
Query Expansion for Noisy Legal Documents Lidan Wang 1,3 and Douglas W. Oard 2,3 1 Computer Science Department, 2 College of Information Studies and 3 Institute for Advanced Computer Studies, University
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationStudy on Merging Multiple Results from Information Retrieval System
Proceedings of the Third NTCIR Workshop Study on Merging Multiple Results from Information Retrieval System Hiromi itoh OZAKU, Masao UTIAMA, Hitoshi ISAHARA Communications Research Laboratory 2-2-2 Hikaridai,
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationData Modelling and Multimedia Databases M
ALMA MATER STUDIORUM - UNIERSITÀ DI BOLOGNA Data Modelling and Multimedia Databases M International Second cycle degree programme (LM) in Digital Humanities and Digital Knoledge (DHDK) University of Bologna
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Retrieval Evaluation Retrieval Performance Evaluation Reference Collections CFC: The Cystic Fibrosis Collection Retrieval Evaluation, Modern Information Retrieval,
More informationInformation Retrieval. Session 11 LBSC 671 Creating Information Infrastructures
Information Retrieval Session 11 LBSC 671 Creating Information Infrastructures Agenda The search process Information retrieval Recommender systems Evaluation The Memex Machine Information Hierarchy More
More informationRanked Feature Fusion Models for Ad Hoc Retrieval
Ranked Feature Fusion Models for Ad Hoc Retrieval Jeremy Pickens, Gene Golovchinsky FX Palo Alto Laboratory, Inc. 3400 Hillview Ave, Building 4 Palo Alto, California 94304 USA {jeremy, gene}@fxpal.com
More informationPerformance Evaluation
Chapter 4 Performance Evaluation For testing and comparing the effectiveness of retrieval and classification methods, ways of evaluating the performance are required. This chapter discusses several of
More informationAn Evaluation Method of Web Search Engines Based on Users Sense
An Evaluation Method of Web Search Engines Based on Users Sense Takashi OHTSUKA y Koji EGUCHI z Hayato YAMANA y y Graduate School of Science and Engineering, Waseda University 3-4-1 Okubo Shinjuku-ku Tokyo,
More informationJames Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!
James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationRobust Relevance-Based Language Models
Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new
More informationAnalyzing Document Retrievability in Patent Retrieval Settings
Analyzing Document Retrievability in Patent Retrieval Settings Shariq Bashir and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna University of Technology, Austria {bashir,rauber}@ifs.tuwien.ac.at
More informationVerbose Query Reduction by Learning to Rank for Social Book Search Track
Verbose Query Reduction by Learning to Rank for Social Book Search Track Messaoud CHAA 1,2, Omar NOUALI 1, Patrice BELLOT 3 1 Research Center on Scientific and Technical Information 05 rue des 03 frères
More informationEfficient query processing
Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking functions In general, document scoring functions are of the form The BM25 function, is one of the best performing:
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationA Cluster-Based Resampling Method for Pseudo- Relevance Feedback
A Cluster-Based Resampling Method for Pseudo- Relevance Feedback Kyung Soon Lee W. Bruce Croft James Allan Department of Computer Engineering Chonbuk National University Republic of Korea Center for Intelligent
More informationDUTH at ImageCLEF 2011 Wikipedia Retrieval
DUTH at ImageCLEF 2011 Wikipedia Retrieval Avi Arampatzis, Konstantinos Zagoris, and Savvas A. Chatzichristofis Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi
More informationiarabicweb16: Making a Large Web Collection More Accessible for Research
iarabicweb16: Making a Large Web Collection More Accessible for Research Khaled Yasser, Reem Suwaileh, Abdelrahman Shouman, Yassmine Barkallah, Mucahid Kutlu, Tamer Elsayed Computer Science and Engineering
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationClassification and Comparative Analysis of Passage Retrieval Methods
Classification and Comparative Analysis of Passage Retrieval Methods Dr. K. Saruladha, C. Siva Sankar, G. Chezhian, L. Lebon Iniyavan, N. Kadiresan Computer Science Department, Pondicherry Engineering
More informationA New Measure of the Cluster Hypothesis
A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer
More informationMaximal Termsets as a Query Structuring Mechanism
Maximal Termsets as a Query Structuring Mechanism ABSTRACT Bruno Pôssas Federal University of Minas Gerais 30161-970 Belo Horizonte-MG, Brazil bavep@dcc.ufmg.br Berthier Ribeiro-Neto Federal University
More informationAutomatic Boolean Query Suggestion for Professional Search
Automatic Boolean Query Suggestion for Professional Search Youngho Kim yhkim@cs.umass.edu Jangwon Seo jangwon@cs.umass.edu Center for Intelligent Information Retrieval Department of Computer Science University
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationA Formal Approach to Score Normalization for Meta-search
A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003
More informationQuery Expansion from Wikipedia and Topic Web Crawler on CLIR
Query Expansion from Wikipedia and Topic Web Crawler on CLIR Meng-Chun Lin, Ming-Xiang Li, Chih-Chuan Hsu and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More informationCity, University of London Institutional Repository
City Research Online City, University of London Institutional Repository Citation: MacFarlane, A., McCann, J. A. & Robertson, S. E. (2000). Parallel search using partitioned inverted files. In: Seventh
More informationMeSH-based dataset for measuring the relevance of text retrieval
MeSH-based dataset for measuring the relevance of text retrieval Won Kim, Lana Yeganova, Donald C Comeau, W John Wilbur, Zhiyong Lu National Center for Biotechnology Information, NLM, NIH, Bethesda, MD,
More informationOn a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval
On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval MASAHARU YOSHIOKA and MAKOTO HARAGUCHI Hokkaido University Even though a Boolean query can express the information need
More informationIndexing and Query Processing
Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation
More informationAdvanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University
Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using
More informationDOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang
DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION Yu-Hwan Kim and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul 5-7, Korea yhkim,btzhang bi.snu.ac.kr ABSTRACT
More informationWord Indexing Versus Conceptual Indexing in Medical Image Retrieval
Word Indexing Versus Conceptual Indexing in Medical Image Retrieval (ReDCAD participation at ImageCLEF Medical Image Retrieval 2012) Karim Gasmi, Mouna Torjmen-Khemakhem, and Maher Ben Jemaa Research unit
More informationGlOSS: Text-Source Discovery over the Internet
GlOSS: Text-Source Discovery over the Internet LUIS GRAVANO Columbia University HÉCTOR GARCÍA-MOLINA Stanford University and ANTHONY TOMASIC INRIA Rocquencourt The dramatic growth of the Internet has created
More informationNavigation Retrieval with Site Anchor Text
Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,
More informationX. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss
X-l X. A Relevance Feedback System Based on Document Transformations S. R. Friedman, J. A. Maceyak, and S. F. Weiss Abstract An information retrieval system using relevance feedback to modify the document
More informationA Practical Passage-based Approach for Chinese Document Retrieval
A Practical Passage-based Approach for Chinese Document Retrieval Szu-Yuan Chi 1, Chung-Li Hsiao 1, Lee-Feng Chien 1,2 1. Department of Information Management, National Taiwan University 2. Institute of
More informationFondazione Ugo Bordoni at TREC 2003: robust and web track
Fondazione Ugo Bordoni at TREC 2003: robust and web track Giambattista Amati, Claudio Carpineto, and Giovanni Romano Fondazione Ugo Bordoni Rome Italy Abstract Our participation in TREC 2003 aims to adapt
More informationWeb Information Retrieval. Exercises Evaluation in information retrieval
Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need
More informationAn Exploration of Query Term Deletion
An Exploration of Query Term Deletion Hao Wu and Hui Fang University of Delaware, Newark DE 19716, USA haowu@ece.udel.edu, hfang@ece.udel.edu Abstract. Many search users fail to formulate queries that
More information