Reducing Click and Skip Errors in Search Result Ranking

Size: px
Start display at page:

Download "Reducing Click and Skip Errors in Search Result Ranking"

Transcription

1 Reducing Click and Skip Errors in Search Result Ranking Jiepu Jiang Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst James Allan Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst ABSTRACT Most current search systems show descriptive summaries of results to searchers such that they can decide whether or not to click and read the results in detail. However, searchers may incorrectly click on a non-relevant result and/or skip relevant ones without clicking for many reasons (e.g., misleading summaries, redundant content). A public dataset shows that 41% of clicked results are non-relevant and searchers skip 45% of relevant results without clicking after reading their summaries. Reducing such click and skip errors are crucial for improving the actual performance of search systems. This paper explores a new paradigm of search result ranking considering both result relevance and the chances of click and skip errors. Ideally, we should place relevant results with greater click probabilities at higher ranks than those with smaller click probabilities, such that searchers are less likely to skip relevant results. Similarly, we demote ranks for non-relevant results with higher click probabilities, such that searchers are less likely to click on non-relevant results. The proposed approach takes advantage of various features that are predictive of click and skip behavior among relevant and non-relevant results, including characteristics of result webpages and summaries, and their relation to search query and past searcher interaction within the same search session. We train ranking models using these features to optimize an ndcg style metric, which measures the quality of a result list by both relevance and click probabilities. Experimental results on an open dataset show our approach reduces 15%-20% of the click and skip errors in search result ranking with a slight decline of result relevance (ndcg@10 decreased by only 3.9%). Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval retrieval models. H.1.2 [Models and Principles]: User/Machine Systems human factors. Keywords Click; interactive search; search result ranking; IR evaluation; summarization. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference 10, Month 1 2, 2010, City, State, Country. Copyright 2010 ACM /00/0010 $ INTRODUCTION The 10-blue links paradigm has been ruling search engines for decades. It is an interaction mode where search systems deliver results to searchers through a search result page (SERP), filled with a ranked list of result summaries. These summaries are previews of result documents, usually customized to the queries, such that searchers can spend a small effort reading the summaries and assess whether it is worthwhile or not to click on the results and read in detail. This mode is adopted in almost all current search systems. The accuracy of this interaction mode greatly affects the actual performance of search systems. Although sometimes searchers can learn relevant information sorely from reading result summaries, in most cases they need to read result documents to acquire relevant information. In such case, it helps little to place a relevant result at the top if most searchers would not click on the result after viewing its summary. Similarly, clicking a non-relevant result costs searchers time and effort and may cause frustration, which has a greater adverse effect than skipping a non-relevant result. However, few discuss these factors in search result ranking and evaluation results are ranked and evaluated by how relevant the documents are, ignoring whether searchers would click or skip the summaries. One solution to the problem is to generate better search result summaries, such that searchers can make better decisions on clicking and skipping. Much work has been done in this direction [23, 24], yet reported click accuracy for search engines and state-of-theart retrieval systems are not satisfying. For example, Yilmaz et al. [26] reported in a commercial search engine query log that the probability of clicking results judged as Perfect, Excellent, Good, Fair, and Bad are 0.94, 0.71, 0.55, 0.45, and 0.49, respectively. Similarly, in the TREC 2014 session track dataset [5] query logs (collected using an Indri-based retrieval system), only 41% of the clicked results are relevant and searchers skipped 45% of the relevant results after reading their summaries. We approach the problem from a different angle by optimizing search result ranking for both relevance and searchers click and skip behavior. Existing approaches do not consider click and skip errors into ranking they purely rank relevant results on top of nonrelevant ones. Assuming we know whether searchers are going to click on or skip a result after reading its summary, we can perform a more sophisticated ranking of search results. For example, comparing two equally relevant results, we should rank the one searchers would click (or are more likely to click) at a higher rank. Similarly, we can demote non-relevant results with high click probabilities to avoid click errors. Yet currently it is unclear how to perform such rankings of search results. In this paper, we present the first study tackling this challenge. We consider both relevance and click and skip errors when ranking search results. Our goal is to reduce the chances of click and skip

2 errors while maintaining the relevance of results. The proposed approach consists of two parts: First, we design features that are predictive of searcher click and skip behavior on relevant and non-relevant results. Section 4 designs and analyzes various features considering: query-independent characteristics of summaries and results, similarity of search query and results, past searcher interaction in the same session, and situational factors. Second, we propose an ndcg style metric measuring quality of search results considering both result relevance and the chances of click and skip errors. This metric can be used as an objective function to train ranking models to solve the problem. Section 5 introduces this metric. We train LambdaMART rankers to optimize the metric and compare with rankers optimized for regular ndcg. Experiment results show the trained rankers can reduce the chances of click and skip errors by 15% to 20%, with only a slight decline of result relevance (ndcg@10 decreased by 3.9%). The proposed approach also outperforms many state-of-the-art retrieval models. The rest of the article introduces our approach and experiments. 2. RELATED WORK This section reviews related work on three topics: searcher click and skip behavior, web search studies on searchers implicit feedback, and search models using session information. Searcher click and skip behavior on a SERP is greatly affected by the summary of results. Tombros and Sanderson [23] first applied query-biased summaries to IR systems, and found increased accuracy and speed of searchers in result clicking comparing to query-independent summaries. White et al. [24] compared querybiased summaries with those using only document titles and abstracts, and came to similar conclusions. This suggests that searcher click and skip errors may relate to the relation between summaries and search queries. Cutrell and Guan [9] experimented using different lengths of summaries in both navigational and informational search tasks. They observed that searchers achieved best clicking accuracy on short summaries in navigational search tasks, but on long summaries in informational search tasks. This suggests that searcher click and skip information may relate to textual characteristics of summaries as well as search tasks. Clarke et al. [7] predicted searcher clicks using result summary features. In comparison, we rank results by both searcher clicks and relevance of search results at the same time. Shokouhi et al. [22] studied how repeated results in different queries were clicked by searchers. Similar information is included in our approach as features. Note that we are interested in searchers click and skip behavior only after viewing result summaries. This is essentially a different task comparing to many studies for predicting clicks based on the position of results on the SERP [6, 12]. These studies mainly aim at accurately explaining position bias of clicks in search logs, such that we can more accurately use click-through data as implicit feedback. Our study is also different from many studies [2, 21] that use clicks as implicit feedback for improving relevance-based ranking of search results. In contrast, we use clicks as ground truth on how would searcher behave (click or skip) after reading results summaries, such that we can reduce click and skip errors in search result ranking. The approach adopted in this paper shares similarity with studies of predicting and optimizing click-through rates (CTRs) of ads [20]. These studies predict position-independent CTRs for ads, such that they can rank ads with CTRs at the top and optimize ads clicks. In comparison, our task is more complex. We optimize click and skip errors for results, which depends on both click probability and result relevance. Our approach for reducing searcher click and skip errors incorporates lots of searcher interaction information, which shares similarity with many web search studies on searcher implicit feedbacks. Joachims et al. [15, 16] explored the validity of using click-through data to generate searcher preference of results. Agichtein [3] incorporated browsing features and query-text features to improve clickthrough features in similar tasks. Although our dataset has relevance judgments, Joachims s studies [15, 16] shed light on our approach of extracting labels for searcher click and skip behavior from search logs. Agichtein et al. [2] also demonstrated the effectiveness of using implicit searcher feedback in improving search result ranking. Hassan et al. [14] and Ageev et al. [1] predicted search success using searcher interaction in a session. Guo et al. [13] predicted search engine switch. These studies show that past searcher interaction data are useful for predicting future searcher interactions, which sheds lights on our approaches. Our study also relates to previous work using local search context information within a search session to improve search performance. For example: Shen et al. [21] proposed a language model based relevance feedback approach to incorporate previous search queries and clicks within a search session; Guan et al. [11] utilized query change information by separately considering changes of terms in query reformulation, as well as whether these terms appear in past results as relevance feedback information; Luo et al. [19] further modeled query change information as actions and optimized session search problem using a POMDP framework. Their work share similarity with our study in that similar datasets were employed (search logs with local session contexts). But these studies still aim at relevance-based ranking and evaluate using relevance of results. Our study mainly differs from their work in that we consider both relevance of results and chances of click and skip errors in search result ranking and evaluation. 3. CLICK AND SKIP ERRORS In this section, we define click and skip errors and discuss the approach of extracting labels for these errors from query logs. 3.1 Definition and Goal We discuss searcher click and skip errors in the de facto standard search result page (SERP) setting. The search engine shows searchers a ranked list of result summaries linking to the result webpages. Searchers read the summaries and make (maybe imperfect) decisions on whether to click on the results and read the full webpages. A summary usually consists of the title and URL of the webpage, and a query-biased snippet [23, 24], with query terms highlighted. This mode is adopted by most current search engines. We consider two properties of a search result: Relevance of the document. Similar to most current studies, we consider multi-level relevance. For simplicity we also use binary relevant (R) and non-relevant (NR) in discussion. Whether searchers will click on the result and read details of the webpage after viewing its summary. We use click and skip for the cases that searchers click and do not click on the result after viewing its summary. The two properties divide results into four categories: R+click, R+skip, NR+click, and NR+skip. We refer to NR+click as click errors, i.e., searchers incorrectly click on a non-relevant result. We call R+skip skip errors, i.e., searchers read result summaries but do not click. Note that R+skip are not necessarily searchers errors, but may instead be correct decisions (e.g., skip a previously clicked relevant result or one with duplicate content). Here we simply call them skip errors to simplify discussion.

3 We assume R+click are preferred over R+skip, and NR+skip are preferred over NR+click in search result ranking. In this paper, we do not consider the cases that searchers learn relevant information directly from reading result summaries (in such cases, R+skip are more preferable because they require fewer effort). This mostly happens when searchers are looking for simple and factual information (e.g., release date Windows 10, UPS phone number ). For most other information needs, searchers still need to access result documents in order to acquire relevant information, especially when the information needs are complex, exploratory, or transactional (e.g., U.S. history, Mac vs. Surface, book a hotel ). The goal of conventional search systems is to rank R over NR. Our goal here is to further rank R+click over R+skip, and NR+skip over NR+click. This ranking reduces the chances of click and skip errors, because it places results with high risks of click and skip errors (e.g., R+skip and NR+click) to lower ranks. 3.2 Dataset Our study relies on search logs to extract ground truth for click and skip labels. We use the TREC 2014 session track dataset [5], because it provides the largest publicly accessible non-anonymous query logs with relevance judgments at the time of our study. The TREC 2014 session track dataset [5] provides information for 1,257 search sessions and 4,666 queries on 60 different search topics. A search session includes a series of queries for the search topic, the results and summaries shown on the SERPs, and searcher interaction information (e.g., clicks). Searchers of these sessions are paid workers on Amazon Mechanical Turk. They were shown the topic descriptions and were requested to work on search topics using an experimental search system for at least 3 minutes. Their search interaction was recorded. The session track 1 removes SERP and searcher interaction information for the last query of each session, because their goal is to evaluate techniques of retrieving results for the last query using past searcher interaction. The experimental search system was built on Indri 2 and worked on a full index of the ClueWeb12 dataset 3. The system ranks results by matching query terms and phrases to webpages contents, titles, URLs, and anchor texts linking to the webpages. Due to the large size of the ClueWeb12 dataset (about 733 million webpages), a result caching strategy was adopted to ensure that the system could respond requests in at most 6 seconds [5]. Result summaries are generated using Indri s built-in functions. According to its source code 4, Indri constructs summaries by selecting document fragments with the highest coverage of query terms, with a bias to the beginning of documents. This is similar to many other approaches [23]. Result relevance was judged by assessors from NIST. The pool of judgment involved all results displayed to the searchers, as well as the top results from participants runs. Due to its large volume, only parts of the sessions were included into the judgment pool. More details of the dataset are discussed in Carterette et al. [5]. 3.3 Extracting click and skip Labels Relevance of results and searcher clicks are readily available in the dataset. We can easily get labels for R and NR and multi-level relevance of results. But we need to further know whether searchers viewed a summary to extract click and skip labels (because both are defined in the condition that searchers viewed the summaries). The observed clicks on a SERP are largely affected by ranks of result summaries. For example, searchers may not click on a result summary simply because it is ranked at the bottom on the SERP (they did not have the chance to view it), which cannot be counted as skip. That type of information can only be collected using biometric sensors (e.g., eye-tracking device), which is costly. Alternatively, we automatically extract click and skip labels by assuming that: if a searcher clicks a result, the searcher has read its summary and all summaries at a higher position on the SERP. Using this assumption, we automatically extract click and skip labels from SERPs with at least one observed click. We simply assign click labels to all observed clicks. We find the lowest ranked clicked results on a SERP and assign skip labels to unclicked results ranked at higher positions on the SERP. We do not consider SERPs without any clicks, and we exclude results with no relevance judgments. We also exclude the first query of each session, because our approach makes use of past searcher interaction information. Using this approach, we obtain labels for 1,085 webpages. The average rank of clicked and skipped summaries are 3.40 and On average the lowest rank of clicked results on SERPs with at least one click is The correctness of the assumption is well supported by previous eye-tracking studies. Joachims et al. [15] showed searchers viewed at least 90% of clicked summaries at rank 1 to 5 and 81.8% at rank 6. According to these statistics, we believe this assumption has about 90% accuracy on assigning click labels. Granka et al. [10] reported that if searchers clicked results at rank 2, 3, and 4, respectively 79%, 80%, and 77% of the results at higher positions were also viewed by searchers. Similar results were reported in Joachims et al. s study [15]. Cutrell and Guan [9] reported an even high chance of viewing summaries if searchers clicked another summary at a lower rank. According to these results, we expect about 80% accuracy on assigning skip labels to summaries. Although this data labeling approach is not perfect, we believe it is the best we can do in an automatic way. More importantly, this makes our approach generalizable, because it does not require more effort to prepare such a dataset than preparing conventional search datasets (the majority of the effort are relevance judgments). 4. FEATURES This section introduces features for ranking search results. We also analyze values of these features among four categories of results, as a way of examining whether these features are predictive of click and skip behavior among relevant and non-relevant results. Considering existing IR methods work well on relevance-based ranking, we focus on features that may help identify click and skip. We show comparisons of mean feature values in the four classes of results. We test significance using Welch s t-test (a variant of t-test that does not require data normality assumption). * and ** mean p<0.05 and p<0.01. We only mark significant differences for click vs. skip, R+click vs. R+skip, and NR+click vs. NR+skip. 4.1 Query-independent Features Cutrell and Guan [9] observed different accuracy of clicking relevant results when searchers were shown result snippets of different lengths. This suggests click and skip behavior can be affected by query-independent characteristics of summaries and webpages. Table 1 lists query-independent features. Figure 1 shows examples of features with significant differences on click and skip. Figure 1(a), 1(b), and 1(c) show clicked and skipped summaries differ in textual characteristics. Clicked summaries have shorter titles comparing to the skipped ones. Clicked and skipped summaries See src/snippetbuilder.cpp (line ) in Indri 5.8.

4 (a) Title Length (w/o stop words) click < skip **, NR+click < NR+skip ** (c) min IDF of words in title click > skip *, NR+click > NR+skip * % 80% 70% 60% (b) max TF of words in title click < skip *, NR+click < NR+skip * (d) Page Rank (percentile) click < skip * Figure 1. Query-independent features with significant differences between clicked and skipped webpage snippets. Table 1. A list of query-independent features. Length of summary title/snippet, with or without Length stop words. Length of the URL (by characters/by levels of directories). Length_URL #unique words Number of unique words in title/snippet. %stopwords Percent of stop words in title/snippet. Percent of words in <a>, <p>, <li>, <td>, heading %wordstag tags (<h1> to <h6>), and emphasis tags (<b>, <i>, <em>, <strong>). Average/maximum/minimum values of TF/IDF TF/IDF/TF IDF for words in result summary title/snippet. #fragment Number of document fragments in the snippet. pagerank Pagerank of the webpage (in percentile). spamrank [8] Waterloo spam rank of the webpage (in percentile). also differ significantly in terms of their titles maximum word frequency and minimum word IDF. Figure 1(d) shows clicked webpages have relatively lower pagerank percentile scores than the skipped ones (but the mean scores for clicked webpages are still as high as over 70). We suspect this is related to the nature of search tasks in the dataset (complex factual and informational search tasks). In these tasks, searchers are less likely interested in webpages with very high pagerank scores (which are usually homepages). Different trends may be observed in other tasks (e.g., navigational search). Despite task dependency, results suggest click and skip behavior relate to webpage centrality. Most other features in Table 1 show significant differences between relevant and non-relevant results, but no clear differences between clicked and skipped ones. 4.2 Features Using Current Search Query Table 2 lists features using the current query and Figure 2 shows examples with significant differences between click and skip. We adopt features measuring the similarity between query and result summary or webpage, including coverage of query terms in summary title, snippet, and URL, and query likelihood (QL) score of webpage [27]. When matching query terms in URLs, we count term occurrence by whether the term is a subsequence of the URL. The original QL scores are not comparable between different queries. Thus we normalize them by query length, i.e., each query term is assigned a weight 1/ Q, where Q is the number of query terms in the query. Figure 2(a) shows that clicked summaries have higher coverage of query terms in their titles comparing to skipped ones. We found similar patterns for coverage of query terms in snippets 90% 80% 70% 60% 50% 40% 85% 75% 65% 55% Figure 2. Features using current query with significant differences between clicked and skipped webpage snippets. %qterm #qtermtag %qtermtag ratio %qtermtag QLscore (a) % current query terms in title All click/skip differences are significant (c) ratio of query term coverage in <a> and in the whole webpage #window(k) minwindow click > skip *, NR+click > NR+skip * % (b) # windows (size=5) in webpage covering all query terms All click/skip differences are significant Table 2. Features using current search query. Percent of query terms in title/snippet/url. Frequency of query terms in different HTML tags. Percent of query terms in different HTML tags. The ratio of %qtermtag in different HTML tags comparing to that of the whole webpage. Normalized query likelihood score of the summary/webpage. Number of windows of size k in webpage covering all terms, any bigrams, or any trigrams in queries. Minimum window size in webpage covering all terms, any bigrams, or any trigrams in queries. and URLs as well. QL scores show significant differences between R+click and R+skip, but not for NR+click and NR+skip. We also use features matching query term phrases in webpages, including the number of windows of size k covering all query terms, and the minimum window size covering all query terms. We set the minimum window size to the document length when not all query terms occur in the webpage. Figure 2(b) shows that it is more likely to find small windows (5 words) covering all query terms in clicked result webpages than in the skipped ones. The distributions of query terms in different HTML tags are also useful features for distinguishing click and skip documents. While analyzing searcher click errors in the dataset, we found many click errors happened when result summaries look relevant (e.g., have high coverage of query terms), but the actual webpages do not provide any helpful information. One common characteristic of these results is that query terms mainly occur in navigational contents, such as hyperlinks and directories. Thus, we compute the coverage of query terms in each specific HTML tag (e.g., <a>, <p>) and in the whole webpage. Then we use their ratio as a feature. As Figure 2(c) and 2(d) show, results with click errors (NR+click) have higher ratio in navigational content (e.g., in <a> tag) but low ratio in main content of the webpage (e.g., in <p> tag). This suggests webpage structure plays an important factor in search result summarization, as well as click and skip behavior. 4.3 Features Using Session Information This section analyzes features using past searcher interaction in the same session, e.g., past queries, SERPs, and clicks. Table 3 lists these features and Figure 3 shows the values of some features. Past search queries in the same session were used in many studies for improving search performance [11, 19, 21]. In our study, we also examine whether past queries can help identify click and skip. We use coverage of past query terms in summary title, snippet, and 80% 60% 40% 20% (d) ratio of query term coverage in <p> and in the whole webpage All click/skip differences are significant

5 80% 70% 60% 50% 40% 30% 20% 100% 90% 80% 70% 60% 50% 40% Figure 3. Features using session information with significant differences between clicked and skipped webpage snippets. %past qterm %past qterm by query type %past SERP result term QLscore past clicked docs Repeated Result (a) % past query terms in title click > skip **, NR+click > NR+skip ** (c) % KEEP terms in title click > skip **, NR+click > NR+skip ** Table 3. Features using session information. Percent of past query terms in title/snippet/url. We consider all past query terms and ADD/KEEP/RMV terms separately. %past qterm considering only 1) past queries w/ clicks, or 2) past query reformulations whose 2nd query has clicks. Coverage of words from past clicked/skipped SERP titles in the summary s title/snippet. Normalized query likelihood score of past queries, past queries with clicks, past clicked/all SERP titles/snippets. Whether a summary of the same result (by URL) was clicked or skipped by the searcher on previous SERPs. URL as features. We compute the coverage of terms for each past query and use the mean value as features. Following Guan et al. s work [11], we also separately consider three types of terms in past query reformulations. For a query q1 and its reformulation q2, ADD terms are those in q2 but not in q1, KEEP terms are those in both queries, and RMV terms are those in q1 but not in q2. We compute the coverage of ADD, KEEP, and RMV terms for each past query reformulation and use their mean value as features. Figure 3(b) shows that coverage of ADD terms can help identify click and skip among both relevant and non-relevant results. In contrast, coverage of KEEP terms only have significant differences between NR+click and NR+skip. Coverage of past query terms show patterns similar to KEEP terms (Figure 3(a)). Whether searchers previously visited exactly the same result is also an important reason for skip. Figure 3(d) shows that 20.3% of R+skip and 10.4% of NR+skip were previously visited by searchers within the same session, comparing to 3.4% and 3.5% for R+click and NR+click. We also adopt variants of past query term coverage features by only considering past queries with clicks, and past query reformulations whose second query has clicks. In addition, we measure the similarity of result summaries and documents with past SERP titles and summaries as features. We separately consider past clicked and skipped SERPs. Due to the limited space, we do not put analysis on these features in this paper. 4.4 Situational Features We also suspect searchers click and skip decisions relate to situational factors which are independent of any specific results, i.e., in a certain period of a session, searchers are more likely to click or skip results. Situational features include: the number of previous 50% 40% 30% 20% 10% 0% 30% 20% 10% 0% (b) % ADD terms in title All click/skip differences are significant (d) Is Previously Visited URL All click/skip differences are significant clicks, SAT clicks, and DSAT clicks; the number of previous queries, and those with clicks. We found that on average searchers have 0.51 SAT clicks when they skip a relevant summary, comparing to 0.39 when they click a relevant one (p<0.05). This suggests searchers are probably more likely to skip relevant results if they have already visited some relevant information in previous searches. 4.5 Summary As a summary, this section introduced and analyzed features using different types of result information (e.g., title, snippet, URL, and webpage) and diverse searcher interaction evidences (e.g., current search query, past search queries, and click and skip behavior on previous SERPs). Analysis shows significantly different patterns between clicked and skipped results on many features. This suggests our features are potentially predictive for searcher click and skip behaviors. In addition, this suggests it is possible to identify click and skip behavior on relevant and non-relevant results automatically. But we are conservative regarding the generalizability of any specific feature patterns. Readers need to be aware that these searcher interaction data were collected using one specific search system and result summarization method. 5. RANKING The ultimate goal of our study is to rank search results considering both relevance of results and potential searcher click and skip errors after viewing result summaries. The task would be simple if we can make perfect predictions on relevance and click/skip behavior. In such case, we can apply a rule and rank results first by relevance and then by predicted click or skip. Unfortunately, making such perfect predictions seems beyond the limit of current technology. We tackle this challenge by training LambdaMART [4] rankers to optimize an ndcg style metric. We name the metric click sensitive ndcg (cs-ndcg). Similar to ndcg, cs-ndcg prefers ranking more relevant results over less relevant ones, and prefers relevant results at the top. In addition to these properties, cs-ndcg penalizes relevant results with low click probabilities and non-relevant ones with high click probabilities. This section introduces this metric and our ranking approach. 5.1 Problem Definition We consider both results that can be labeled for click and skip and that cannot. This is because the criterion in Section 3.3 can only automatically label less than half of the top 10 results (on average the lowest rank of clicked results is 3.13), which greatly limits the number of ranking candidates. We assign unknown labels to the results that cannot be automatically labeled for click and skip. In total we have six categories of results: R+click, R+skip, R+unknown, NR+click, NR+skip, and NR+unknown. Our goal is to rank results by the following order (at this point we assume binary relevance to simplify discussion): R+click > R+unknown > R+skip = NR+skip > NR+unknown > NR+click This is because we do not know how searchers would behave on results without click and skip labels (unknown). In such case, it is reasonable to assume searchers are less likely to click on these results comparing to results with observed clicks (click), but more likely to click on them comparing to those with observed skip behavior (skip). Therefore, we rank R+unknown between R+click and R+skip because searchers are more likely to click and benefit from R+unknown comparing to R+skip, and they are less likely to do so comparing to R+click. For similar reasons, we should rank NR+un-

6 known between NR+skip and NR+click. We do not prefer any ranking between R+skip and NR+skip because in case searchers skip a result, relevance of results means little to the searcher. The rest of this section introduces cs-ndcg, which evaluates the quality of a rank list based on the ranking rule presented above. It also takes into consideration multiple relevance levels and discount influence of an individual result by its rank in the list. 5.2 Click Sensitive ndcg (cs-ndcg) Let L={D1, D2,, Dn} be a ranked list of n search results and their summaries. We measure Di s contribution to the quality of the rank list (g(di)) as Equation (1), where: ri is the relevance degree of Di judged by the assessors (ri = 0 means Di is not relevant); pi is the probability of clicking Di after viewing its summary; gp is the penalty of clicking a non-relevant result. r ( 2 i 1 ) pi ri > 0 g( D ) i = gppi ri = 0 In essence, Equation (1) weights gain of a relevant result by its clicking probability, where the same gain function of ndcg is adopted. In addition, Equation (1) sets an adverse effect for a clicked non-relevant results, which can set off the positive contribution (gain) of relevant results. The scale of the adverse effect is controlled by a parameter gp, and we weight the adverse effect of a result by its clicking probability. We assign pi to 1 if Di is labeled click, and 0 if Di has label skip. For R+unknown and NR+unknown, we make our best guess assign pi to the probability of clicking a relevant and a non-relevant result after viewing its summary, respectively. The two probabilities can be estimated from the dataset using existing click and skip labels, i.e., PP(cccccccccc RR) = RR+cccccccccc and RR PP(cccccccccc NNNN) = NNNN+cccccccccc, where X is the number of instances has NNNN label X. In our dataset, P(click R) = 0.55 and P(click NR) = Similar to ndcg, we sum up the contribution of results at each rank, with a position-based discount function. We refer to the sum as cs-dcg for the rank list, as in Equation (2). The position-based discount function is the same as the one in ndcg, which ensures results on the top of the rank list have a greater impact on the whole rank list. n g( Di ) cs-dcg ( L) = (2) log i + 1 i= 1 2 ( ) Click sensitive ndcg (cs-ndcg) for a rank list L is calculated by normalizing its cs-dcg to the range [0, 1]. Different from DCG, cs-dcg can be negative values due to the penalty of clicking nonrelevant results. Therefore, we normalize cs-dcg by both its lower bound and upper bound, as Equation (3). The ideal rank list (Lbest) is defined as a rank list where all results have the highest relevance labels (4 in our dataset) and perfect click accuracy (always click). The worst rank list (Lworst) include all observed NR+click, followed by non-relevant results without observations on click and skip. cs-dcg ( L) cs-dcg ( Lworst ) cs-ndcg ( L) = (3) cs-dcg L cs-dcg L ( ) ( ) Similar to the case of ndcg [25], when training LambdaMART models using cs-ndcg, we update using the following λ-gradients, where Δcs-nDCG is the cs-ndcg value gained by swapping a pair of results Di and Dj in the rank list. best C λ cs-ndcg ij ij = Sij ο ij worst (1) (4) Table 4. Contribution of results with different relevance. Relevance Label for D g(d) (g p=1) click unknown skip Properties of cs-ndcg One may notice that cs-ndcg is essentially equivalent to regular ndcg if we set pi=1 for all relevant results and pi=0 for all nonrelevant results. In such case, g(di) equates the gain function in regular ndcg. Therefore, regular ndcg can be considered as a special case of cs-ndcg by making the following assumptions: searchers would always click relevant results and skip non-relevant results. Apparently, these are not warranted assumptions. cs-ndcg prefers ranking by the criterion in Section 5.1. Table 4 shows the actual contribution of results, i.e., g(d), with click, skip, and unknown labels in our dataset. It ensures that among relevant results with equal relevance labels, we prefer R+click over R+unknown. Due to the exponential form gain function and the estimated click probability for relevant results (0.55) in our dataset, R+click and R+unknown with higher relevance levels are always preferred over those with lower relevance levels, e.g., R+unknow (r=4) has a contribution 8.27 and R+click (r=3) has a contribution 7. This ensures that in most cases, we still prefer relevant results with higher relevance labels, but R+click are preferred over R+unknown. The contribution function also ensures that NR+skip is preferred over NR+unknown and then over NR+click. The penalty parameter gp controls the scale of the adverse effect of clicking a non-relevant results on searcher experience. The greater the value of gp, the more the quality of a rank list is affected by NR+click (click errors). 5.4 Incorporating Partly Judged Queries In essence, our approach is to train ranking models using both relevance judgments and observed searcher click and skip behavior in query logs. Ideally, we hope to have relevance and click/skip labels for many document in many queries. However, practically, we usually have relevance judgments on many documents (depending on the size of the pool for relevance judgment), but only for a small set of queries. In contrast, we can only have a few click and skip observations per query (depending on the lowest click rank), but we can easily get labels in more queries due to the scalability of query logs. The cs-ndcg metric considers the problem of limited click/skip observation per query we set click probabilities based on result relevance when there is no observed click/skip behavior. This section discusses another situation. Once we have relevance judgments on a set of topics, we can have more and more click/skip observations on queries of similar topics (or exactly the same queries). We may reuse old relevance judgment on new queries, but these queries may have unjudged results. It is risky to simply assume all unjudged results as non-relevant and put them into training. Instead, we simply set Δcs-nDCG to 0 if either or both results of a swapping pair has not been judged we skip updating model in case of unjudged results. Similarly, when training using regular ndcg as the target metric, we can also incorporate partly judged queries by setting ΔnDCG and λ-gradients to 0 if either or both results of the swapping pair has not been judged. The TREC session track dataset offers opportunities for studying the helpfulness of using partly judged queries for training. Among the 1,257 sessions in the dataset, only results for queries in the first

7 120 sessions (and top results of participants submissions on these sessions) were included into relevance judgment pool. The rest of the sessions have overlap with the 120 sessions in topics. We reuse the judged topic s relevance judgment to other sessions with the same topic. Some of the sessions queries have unjudged results in the top 10 results. In our experiments, we also evaluate how helpful it is to incorporate these partly judged queries into training. 6. EXPERIMENTS We apply our approaches to re-rank top 10 results of queries in the TREC 2014 session track dataset. We only re-rank top 10 results of queries, because the dataset only provides the top 10 results and the fact that we only observed 3.13 click/skip labels on average for a query. We evaluate results by both relevance of a rank list and the chances of committing click and skip errors in the rank list. We use the TREC 2014 session track dataset for evaluation. We select queries from the dataset with full relevance judgment and at least one click on its top 10 results. In order to apply features using past search interaction information, we do not select the first query of each session. This selects 145 queries. These queries are used for evaluation. To make use of partly judged queries, we further select other queries (excluding the first query of each session) with at least 2 results being judged and at least one click in top 10 results. This selects 222 partly judged queries. We use 10-fold cross validation in all experiments: each fold uses 70% of the 145 fully judged queries for training, 20% for validation, and 10% for testing. We also incorporate the 222 partly judged queries into training using the approach discussed in Section 5.4. We do not evaluate results using cs-ndcg since this metric has not yet been validated as an evaluation metric. Instead, we evaluate results using regular ndcg@10 and the following metrics measuring click and skip errors: Chances of click & skip errors at rank k (E@k). We assume searchers would commit a click or skip error on a result in the reranked results if we observed such an error on the same result for this query in the log. E@k equates the total number of click and skip errors on results at rank 1 to k devided by k. This metric is similar to P@k but it counts click & skip errors. Discounted cumulated errors at rank k (DCE@k). Similar to discounted cumulated gain (DCG), we can measure the cumulated number of click and skip errors at rank k with a position-based discount. We use the same discount function in DCG. Number of pairwise ranking errors. We count the following pairwise ranking errors in the re-ranked results: R+skip>R+click, NR+click>NR+skip, NR+click>R+click, NR+skip>R+click, and NR+click>R+skip. A>B means A is ranked at a higher rank than B. R+skip>R+click is measured only among pairs of results with the same level of relevance. In addition, we use R&NR for any ranking errors between a relevant and a non-relevant results, i.e., the sum of NR+click>R+click, NR+skip>R+click, and NR+click>R+skip. Similarly, we use click&skip for the sum of R+skip>R+click and NR+click>NR+skip, which are click and skip ranking errors. We also count the sum of click&skip and R&NR. We compare our approach (Lambda ranking models trained for cs-ndcg) with a few baselines. Few previous studies target this problem. Therefore, we compare our approach with LambdaMART ranking models trained for regular ndcg. We also compare with state-of-the-art ad hoc search models and session search approaches with strong reported performance on the TREC session track datasets, including: Query likelihood model with Dirichlet smoothing (QL) [27]. Context sensitive relevance feedback (CSRF) [21]. CSRF is a relevance feedback approach considering past search queries and contents of clicked results within the same session. Variants of CSRF were ranked at the top in the TREC 2011 and 2012 session tracks [17, 18]. Query change model (QCM) [11]. QCM considers changes of terms in past query reformulation (i.e., ADD, KEEP, RMV terms in Section 4.3) and whether these terms occur in past search results as relevance feedback. Variants of QCM were ranked at the top in the TREC 2014 session track [5]. We also provide evaluation for the original ranking of results in the query log, and perfect and worst ranking of results by ndcg and by cs-ndcg. We report mean values of the evaluation metrics on the 145 fully judged queries in following sections. We test significant difference of results using a paired t-test. 7. EVALUATION In this section, we first compare our approach with baselines in Section 7.1. Section 7.2 conducts feature ablation analysis. In Section 7.3, we compare results using only fully judged queries for training with those incorporate partly judged queries into training. 7.1 Click-Sensitive Ranking Models Table 5 shows re-ranking effectiveness of our approaches and the baseline approaches using both fully judged and partly judged queries for training in each fold. Reported results are mean values of metrics on the 145 fully judged test queries. Results show our approach can significantly reduce the chances of click and skip errors with a tradeoff of only a slight decline in the relevance of rank list. As Table 5 shows, when setting gp to 3, our ranking model trained for cs-ndcg@10 ( All Features (csndcg, gp=3) ) significantly reduced the discounted cumulated click & skip errors (DCE@5) by 14.7% (0.526 vs , smaller values are better) comparing to the model using same features but trained for regular ndcg@10 ( All Features (ndcg) ). In addition, this model also reduced the chances of click and skip errors (E@k) from 21.4% to 17.9% at rank 1 (p<0.05), from 21.4% to 17.6% at rank 2 (p<0.05), and from 21.6% to 19.3% at rank 3. Similarly, the total number of pairwise ranking errors (only counting results with observed click/skip labels) have also been reduced from 1.35 to 1.09 (by 19.8%, p<0.01). While reducing click & skip errors by about 15% to 20%, the relevance of the rank list only slightly decreased ndcg@10 declined by 3.9% (0.274 vs , p<0.01). The number of pairwise ranking errors between relevant and non-relevant results (R & NR) slightly increased from to (p 0.05), but those related to click and skip (click & skip) reduced by 24.4% (0.897 vs , p<0.01). This indicates our approach is effective for reducing click and skip errors, with only a small cost of reducing result relevance. We can come to the same conclusion when setting gp to 1 in csndcg@10 to train ranking models, but the effectiveness of the model changed slightly comparing to using gp = 3. Figure 4 further explores the effectiveness of LambdaMART ranking models using different values of gp for training. Figure 4(a) shows that the value of gp does not much affect relevance of the rank list (ndcg@10). With gp increases from 0.25 to 3, click and skip errors slightly decreased: DCE@5 declined from to 0.526, E@1 from to 0.179, and E@2 from to There is a slight increase in the chances of error at rank 3 (E@3), we suspect this is because we only re-rank the top 10 results. In such case, the total number of click and skip errors in the top 10 results is a constant. Therefore when chances of error at rank 1 and 2 declines, the chances of error at lower ranks will naturally increase.

8 Table 5. Comparison of ranking models (using all features) optimized for ndcg and cs-ndcg, and other baseline approaches. Runs All R & NR click & skip Number of Pairwise Ranking Errors R+skip > R+click NR+click > NR+skip NR+click > R+click NR+skip > R+click Original Perfect (Relevance) Worst (Relevance) Perfect (cs-ndcg) Worst (cs-ndcg) QL (ndcg) QL (cs-ndcg, gp=1) CSRF (ndcg) CSRF (cs-ndcg, gp=1) QCM (ndcg) QCM (cs-ndcg, gp=1) All Features (ndcg) (0.285) (0.172) (0.055) All Features (cs-ndcg, gp=1) (0.177) (1.076) (0.876) (0.359) (0.083) All Features (cs-ndcg, gp=3) (0.526) (0.179) (0.176) (0.503) (0.007) () indicates the best results in its column (excluding oracle runs). Darker and lighter shading for QL, CSRF, QCM indicate significant difference versus All Features (cs-ndcg, gp=1) at 0.01 and 0.05 levels. Darker and lighter shading for All Features (cs-ndcg, gp=1) and All Features (cs-ndcg, gp=3) indicate significant difference versus All Features (ndcg) at 0.01 and 0.05 levels. NR+click > R+skip (a) Effects of g p on ndcg and chances of click & skip errors ndcg@10 DCE@5 E@1 E@2 E@3 DCE@5 (train w/ ndcg) Figure 4. Effects of gp on the effeteness of ranking models (b) Effects of g p on pairwise ranking errors DCE@5 (train w/ ndcg) All Pairwise Errors R & NR click & skip R+skip>R+click All Pairwise Errors (train w/ ndcg) Figure 4(b) further shows the effects of gp on the number of pairwise ranking errors. Similarly, we observed slight declines in total number of ranking errors (All), R & NR errors, total number of click & skip errors, and click & skip errors among non-relevant results (NR+click>NR+skip). But click & skip errors among relevant result increased from to This confirmed our expectation on cs-ndcg the parameter gp controls the scale of the adverse effect for clicking non-relevant results. Greater values in gp prefer ranking models with fewer click errors (NR+click), but there is a cost of increasing click and skip errors among the relevant results. Despite the variability of models effectiveness on different values of gp, models trained using cs-ndcg consistently reduced the chances of click and skip errors comparing to those trained using the regular ndcg. As Table 5 shows, both our approaches and the baselines committed significantly more click & skip errors than R & NR ones. The best approach can reduce the number of R & NR errors to as few as (only 11.6% of total possible R & NR errors in the dataset) and achieve ndcg@10 very close to the best value we can reach by re-ranking the top 10 results (0.285 vs ). But there is still a large room for improvements on click & skip errors even our best approach committed on average observed click & skip ranking errors in the top 10 results, which is 40.5% of the possible click & skip errors we can make in this dataset (2.159). This indicates the limited understanding of current study, including our own Table 6. Ranking models using different features. Runs ndcg@10 DCE@5 E@1 E@2 E@3 R&NR click&skip All Features (0.179) (0.193) (0.897) No Session No Summary Webpage (0.193) Title (0.502) (0.169) (0.168) Snippet URL Current Query Past Queries Q Independent (0.277) Darker and lighter shading indicate significant difference vs. All Features at 0.01 and 0.05 levels. study, on click & skip errors. In contrast, existing methods already achieved satisfactory performance on ranking relevant and non-relevant results, as indicated by the low R & NR errors in Table 5. Comparing to other ad hoc search and session search baselines (e.g., QL, CSRF, and QCM), our approach consistently outperform all of them in both relevance of results and the chances of click and skip errors. This is not surprising considering much more information is involved in our features comparing to the baselines. This also indicates our feature set is very effective for solving relevancebased ranking of results and click and skip error reduction. To summarize, results in this section demonstrate our approach can effectively rank search results considering both result relevance and the chances of click and skip errors. Comparing to existing methods, our approach can significantly reduce the chances of searcher click and non-click errors by about 15% to 20%, with only slight decrease in ndcg@ Effectiveness of Features We further analyze the effectiveness of different feature sets in this section. Table 6 shows results for models trained for cs-ndcg (gp = 3) using different sets of features. We first compare features using webpage full content with those using different elements of result summaries (title, snippet, URL). As Table 6 shows, models using webpage content can achieve better relevance of results, but significantly more click and skip errors. Actually, features using only result summary title achieved the best DCG@5 even less errors than using the combination of all other feature, although it has limited relevance of results. This indicates

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

On Duplicate Results in a Search Session

On Duplicate Results in a Search Session On Duplicate Results in a Search Session Jiepu Jiang Daqing He Shuguang Han School of Information Sciences University of Pittsburgh jiepu.jiang@gmail.com dah44@pitt.edu shh69@pitt.edu ABSTRACT In this

More information

Reducing Redundancy with Anchor Text and Spam Priors

Reducing Redundancy with Anchor Text and Spam Priors Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson Microsoft Research Search = Modeling User Behavior

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Northeastern University in TREC 2009 Million Query Track

Northeastern University in TREC 2009 Million Query Track Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

An Exploration of Query Term Deletion

An Exploration of Query Term Deletion An Exploration of Query Term Deletion Hao Wu and Hui Fang University of Delaware, Newark DE 19716, USA haowu@ece.udel.edu, hfang@ece.udel.edu Abstract. Many search users fail to formulate queries that

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning, Pandu Nayak and Prabhakar Raghavan Evaluation 1 Situation Thanks to your stellar performance in CS276, you

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Fall Lecture 16: Learning-to-rank

Fall Lecture 16: Learning-to-rank Fall 2016 CS646: Information Retrieval Lecture 16: Learning-to-rank Jiepu Jiang University of Massachusetts Amherst 2016/11/2 Credit: some materials are from Christopher D. Manning, James Allan, and Honglin

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Evaluation Rank-Based Measures Binary relevance Precision@K (P@K) Mean Average Precision (MAP) Mean Reciprocal Rank (MRR) Multiple levels of relevance Normalized Discounted

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

On Duplicate Results in a Search Session

On Duplicate Results in a Search Session On Duplicate Results in a Search Session Jiepu Jiang Daqing He Shuguang Han School of Information Sciences University of Pittsburgh jiepu.jiang@gmail.com dah44@pitt.edu shh69@pitt.edu ABSTRACT In this

More information

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user EVALUATION Sec. 6.2 THIS LECTURE How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results usable to a user 2 3 EVALUATING

More information

SEO: SEARCH ENGINE OPTIMISATION

SEO: SEARCH ENGINE OPTIMISATION SEO: SEARCH ENGINE OPTIMISATION SEO IN 11 BASIC STEPS EXPLAINED What is all the commotion about this SEO, why is it important? I have had a professional content writer produce my content to make sure that

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University Users are generally loyal to one engine Even when engine switching cost is low, and even when they are

More information

for Searching Social Media Posts

for Searching Social Media Posts Mining the Temporal Statistics of Query Terms for Searching Social Media Posts ICTIR 17 Amsterdam Oct. 1 st 2017 Jinfeng Rao Ferhan Ture Xing Niu Jimmy Lin Task: Ad-hoc Search on Social Media domain Stream

More information

A Dynamic Bayesian Network Click Model for Web Search Ranking

A Dynamic Bayesian Network Click Model for Web Search Ranking A Dynamic Bayesian Network Click Model for Web Search Ranking Olivier Chapelle and Anne Ya Zhang Apr 22, 2009 18th International World Wide Web Conference Introduction Motivation Clicks provide valuable

More information

Whitepaper Spain SEO Ranking Factors 2012

Whitepaper Spain SEO Ranking Factors 2012 Whitepaper Spain SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Relevance and Effort: An Analysis of Document Utility

Relevance and Effort: An Analysis of Document Utility Relevance and Effort: An Analysis of Document Utility Emine Yilmaz, Manisha Verma Dept. of Computer Science University College London {e.yilmaz, m.verma}@cs.ucl.ac.uk Nick Craswell, Filip Radlinski, Peter

More information

Whitepaper US SEO Ranking Factors 2012

Whitepaper US SEO Ranking Factors 2012 Whitepaper US SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics Inc. 1115 Broadway 12th Floor, Room 1213 New York, NY 10010 Phone: 1 866-411-9494 E-Mail: sales-us@searchmetrics.com

More information

An Active Learning Approach to Efficiently Ranking Retrieval Engines

An Active Learning Approach to Efficiently Ranking Retrieval Engines Dartmouth College Computer Science Technical Report TR3-449 An Active Learning Approach to Efficiently Ranking Retrieval Engines Lisa A. Torrey Department of Computer Science Dartmouth College Advisor:

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection

More information

Evaluating an Associative Browsing Model for Personal Information

Evaluating an Associative Browsing Model for Personal Information Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

A Comparative Analysis of Cascade Measures for Novelty and Diversity

A Comparative Analysis of Cascade Measures for Novelty and Diversity A Comparative Analysis of Cascade Measures for Novelty and Diversity Charles Clarke, University of Waterloo Nick Craswell, Microsoft Ian Soboroff, NIST Azin Ashkan, University of Waterloo Background Measuring

More information

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation Ahmed Hassan Microsoft Research Redmond, WA hassanam@microsoft.com Yang Song Microsoft Research

More information

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Evaluation. David Kauchak cs160 Fall 2009 adapted from: Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For

More information

Effective Structured Query Formulation for Session Search

Effective Structured Query Formulation for Session Search Effective Structured Query Formulation for Session Search Dongyi Guan Hui Yang Nazli Goharian Department of Computer Science Georgetown University 37 th and O Street, NW, Washington, DC, 20057 dg372@georgetown.edu,

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Towards Predicting Web Searcher Gaze Position from Mouse Movements

Towards Predicting Web Searcher Gaze Position from Mouse Movements Towards Predicting Web Searcher Gaze Position from Mouse Movements Qi Guo Emory University 400 Dowman Dr., W401 Atlanta, GA 30322 USA qguo3@emory.edu Eugene Agichtein Emory University 400 Dowman Dr., W401

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from lectures by Prof. Ray Mooney (UT) and Prof. Razvan

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Overview of the NTCIR-13 OpenLiveQ Task

Overview of the NTCIR-13 OpenLiveQ Task Overview of the NTCIR-13 OpenLiveQ Task Makoto P. Kato, Takehiro Yamamoto (Kyoto University), Sumio Fujita, Akiomi Nishida, Tomohiro Manabe (Yahoo Japan Corporation) Agenda Task Design (3 slides) Data

More information

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks Ben Carterette Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA 01003 carteret@cs.umass.edu

More information

Navigation Retrieval with Site Anchor Text

Navigation Retrieval with Site Anchor Text Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,

More information

Whitepaper Italy SEO Ranking Factors 2012

Whitepaper Italy SEO Ranking Factors 2012 Whitepaper Italy SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Automatic Query Type Identification Based on Click Through Information

Automatic Query Type Identification Based on Click Through Information Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

TREC 2017 Dynamic Domain Track Overview

TREC 2017 Dynamic Domain Track Overview TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction

More information

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using

More information

Detecting Good Abandonment in Mobile Search

Detecting Good Abandonment in Mobile Search Detecting Good Abandonment in Mobile Search Kyle Williams, Julia Kiseleva, Aidan C. Crook Γ, Imed Zitouni Γ, Ahmed Hassan Awadallah Γ, Madian Khabsa Γ The Pennsylvania State University, University Park,

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

nding that simple gloss (i.e., word-by-word) translations allowed users to outperform a Naive Bayes classier [3]. In the other study, Ogden et al., ev

nding that simple gloss (i.e., word-by-word) translations allowed users to outperform a Naive Bayes classier [3]. In the other study, Ogden et al., ev TREC-9 Experiments at Maryland: Interactive CLIR Douglas W. Oard, Gina-Anne Levow, y and Clara I. Cabezas, z University of Maryland, College Park, MD, 20742 Abstract The University of Maryland team participated

More information

Window Extraction for Information Retrieval

Window Extraction for Information Retrieval Window Extraction for Information Retrieval Samuel Huston Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA, 01002, USA sjh@cs.umass.edu W. Bruce Croft Center

More information

Predicting Query Performance on the Web

Predicting Query Performance on the Web Predicting Query Performance on the Web No Author Given Abstract. Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However,

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Northeastern University in TREC 2009 Web Track

Northeastern University in TREC 2009 Web Track Northeastern University in TREC 2009 Web Track Shahzad Rajput, Evangelos Kanoulas, Virgil Pavlu, Javed Aslam College of Computer and Information Science, Northeastern University, Boston, MA, USA Information

More information

Overview of the TREC 2013 Crowdsourcing Track

Overview of the TREC 2013 Crowdsourcing Track Overview of the TREC 2013 Crowdsourcing Track Mark D. Smucker 1, Gabriella Kazai 2, and Matthew Lease 3 1 Department of Management Sciences, University of Waterloo 2 Microsoft Research, Cambridge, UK 3

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Information Retrieval

Information Retrieval Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment

More information

UMass at TREC 2017 Common Core Track

UMass at TREC 2017 Common Core Track UMass at TREC 2017 Common Core Track Qingyao Ai, Hamed Zamani, Stephen Harding, Shahrzad Naseri, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval College of Information and Computer

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

High Accuracy Retrieval with Multiple Nested Ranker

High Accuracy Retrieval with Multiple Nested Ranker High Accuracy Retrieval with Multiple Nested Ranker Irina Matveeva University of Chicago 5801 S. Ellis Ave Chicago, IL 60637 matveeva@uchicago.edu Chris Burges Microsoft Research One Microsoft Way Redmond,

More information

Overview of the NTCIR-13 OpenLiveQ Task

Overview of the NTCIR-13 OpenLiveQ Task Overview of the NTCIR-13 OpenLiveQ Task ABSTRACT Makoto P. Kato Kyoto University mpkato@acm.org Akiomi Nishida Yahoo Japan Corporation anishida@yahoo-corp.jp This is an overview of the NTCIR-13 OpenLiveQ

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

Fighting Search Engine Amnesia: Reranking Repeated Results

Fighting Search Engine Amnesia: Reranking Repeated Results Fighting Search Engine Amnesia: Reranking Repeated Results ABSTRACT Milad Shokouhi Microsoft milads@microsoft.com Paul Bennett Microsoft Research pauben@microsoft.com Web search engines frequently show

More information

doi: / _32

doi: / _32 doi: 10.1007/978-3-319-12823-8_32 Simple Document-by-Document Search Tool Fuwatto Search using Web API Masao Takaku 1 and Yuka Egusa 2 1 University of Tsukuba masao@slis.tsukuba.ac.jp 2 National Institute

More information

Search Costs vs. User Satisfaction on Mobile

Search Costs vs. User Satisfaction on Mobile Search Costs vs. User Satisfaction on Mobile Manisha Verma, Emine Yilmaz University College London mverma@cs.ucl.ac.uk, emine.yilmaz@ucl.ac.uk Abstract. Information seeking is an interactive process where

More information

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance

More information

Estimating Embedding Vectors for Queries

Estimating Embedding Vectors for Queries Estimating Embedding Vectors for Queries Hamed Zamani Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 zamani@cs.umass.edu

More information

Interactive Campaign Planning for Marketing Analysts

Interactive Campaign Planning for Marketing Analysts Interactive Campaign Planning for Marketing Analysts Fan Du University of Maryland College Park, MD, USA fan@cs.umd.edu Sana Malik Adobe Research San Jose, CA, USA sana.malik@adobe.com Eunyee Koh Adobe

More information

Part 7: Evaluation of IR Systems Francesco Ricci

Part 7: Evaluation of IR Systems Francesco Ricci Part 7: Evaluation of IR Systems Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 This lecture Sec. 6.2 p How

More information

Improving Patent Search by Search Result Diversification

Improving Patent Search by Search Result Diversification Improving Patent Search by Search Result Diversification Youngho Kim University of Massachusetts Amherst yhkim@cs.umass.edu W. Bruce Croft University of Massachusetts Amherst croft@cs.umass.edu ABSTRACT

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 5: Evaluation Ruixuan Li http://idc.hust.edu.cn/~rxli/ Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks

More information

Heading-aware Snippet Generation for Web Search

Heading-aware Snippet Generation for Web Search Heading-aware Snippet Generation for Web Search Tomohiro Manabe and Keishi Tajima Graduate School of Informatics, Kyoto Univ. {manabe@dl.kuis, tajima@i}.kyoto-u.ac.jp Web Search Result Snippets Are short

More information

Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu

Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu School of Computer, Beijing Institute of Technology { yangqing2005,jp, cxzhang, zniu}@bit.edu.cn

More information

Experiments with ClueWeb09: Relevance Feedback and Web Tracks

Experiments with ClueWeb09: Relevance Feedback and Web Tracks Experiments with ClueWeb09: Relevance Feedback and Web Tracks Mark D. Smucker 1, Charles L. A. Clarke 2, and Gordon V. Cormack 2 1 Department of Management Sciences, University of Waterloo 2 David R. Cheriton

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Information Retrieval

Information Retrieval Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing

More information