A Probabilistic Relevance Propagation Model for Hypertext Retrieval

Size: px

Start display at page:

Download "A Probabilistic Relevance Propagation Model for Hypertext Retrieval"

Dwayne Spencer
5 years ago
Views:

1 A Probabilistic Relevance Propagation Model for Hypertext Retrieval Azadeh Shakery Department of Computer Science University of Illinois at Urbana-Champaign Illinois 6181 ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Illinois 6181 ABSTRACT A major challenge in developing models for hypertext retrieval is to effectively combine content information with the link structure available in hypertext collections. Although several link-based ranking methods have been developed to improve retrieval results, none of them can fully exploit the discrimination power of contents as well as fully exploit all useful link structures. In this paper, we propose a general relevance propagation framework for combining content and link information. The framework gives a probabilistic score to each document defined based on a probabilistic surfing model. Two main characteristics of our framework are our probabilistic view on the relevance propagation model and propagation through multiple sets of neighbors. We compare eight different models derived from the probabilistic relevance propagation framework on two standard TREC Web test collections. Our results show that all the eight relevance propagation models can outperform the baseline content only ranking method for a wide range of parameter values, indicating that the relevance propagation framework provides a general, effective and robust way of exploiting link information. Our experiments also show that using multiple neighbor sets outperforms using just one type of neighbors significantly and taking a probabilistic view of propagation provides guidance on setting propagation parameters. Categories and Subject Descriptors H.3.3. [Information Storage and Retrieval]: Information Search and Retrieval Retrieval Models General Terms Algorithms, Experimentation Keywords Content and link ranking, hypertext retrieval model, probabilistic relevance propagation, web information retrieval Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM 6, November 5 11, 26, Arlington, Virginia, USA. Copyright 26 ACM /6/11...$ INTRODUCTION Hypertext Retrieval, the task of searching for information in a hypertext collection, has been around for a while. A key characteristic that distinguishes the search task in a hypertext collection from a traditional retrieval task is the existence of link information in the former one. Although the primary goal of creating links is to guide a user to other parts of the collection, the link information can also be exploited to improve the search accuracy. The existence of this extra information makes it inappropriate to use traditional information retrieval methods, which do the retrieval task based on the content only, to do the search task. The early works on the hypertext retrieval task were more on the literature side. Some researchers have used bibliographic citation methods to determine relationships among documents in scientific papers [32, 16, 37]. They have used different citation methods in this direction, namely direct citation, bibliographic coupling - the sharing of one or more references by two documents - and co-citation. Modha and Spangler [25] have proposed a clustering algorithm that clusters hypertext documents using words, out-links and in-links, Chakrabrti et al. [6] have developed a technique called spectral filtering for discovering high-quality topical resources in hyperlinked corpora and Ray Larson [22] has applied co-citation analysis methods to the World Wide Web to produce clusterings of the WWW sites that have topical similarities. Currently with the fast growth and popularity of the World Wide Web, the search task on this huge collection of hypertext data has gained much attention. The problem of hypertext retrieval on the Web has been studied extensively. Several link-based ranking methods have been developed to improve retrieval results [2, 27, 2, 5, 3, 26, 18, 3, 14, 36, 4, 39, 24, 38, 41, 43, 19, 4, 1, 29]. Although these algorithms have been shown to improve the performance over some baseline approaches, it remains a challenging research question what is the best way to exploit the content information and the link information to maximize search accuracy. These works appear to have adopted five strategies for combining content and link information: (1) Using the query as a filter to select documents and rank them according to link-based scores (e.g., PageRank [27] and HITS [2]); (2) Computing a weighted combination of topic-specific PageRank scores, where the weights are determined by the query (Topic-sensitive PageRank [18]); (3) Using the query to compute the relevance value of each document and regulating the influence of nodes in HITS using these value (e.g. ARC [5], Bharat and Henzinger[2]); (4) Using the query to compute the relevance value of each document and propagate these values through links (Intelligent Surfer [3], [36], [29]). (5) Using sitemap links to propagate term frequencies ([38], [29]). Unfortunately, none of these combination methods can fully exploit

2 the discrimination power of contents as well as fully exploit all useful link structures. Despite the importance of link information, the contents of documents are clearly the most direct evidence regarding whether a document is relevant to a user s interest. Thus presumably, contents of the documents should be the main basis for ranking them. In this sense, among the five strategies, only the last two are close to fully exploiting the content information to improve ranking. However, the intelligent surfer only considers the in-links of a document, the relevance propagation method only considers direct in-links or out-links and the term propagation method only considers parent-child links in a sitemap. Each of these methods only considers one type of explicit neighbors and none of them fully take advantage of all the available link information. But intuitively, all neighbors can be potentially exploited; for example, both out-links and in-links may be useful as we will show in our experiments. Besides, for the propagation methods, there exist no principled framework to do the propagation. For example, the content scores can be transformed using any monotonic function without affecting the ranking, but such transformation would presumably affect the propagation. How should we transform the scores to achieve the best propagation results? In this paper, we propose a general probabilistic relevance propagation framework for combining content and link information, which can fully take advantage of content information and the link structure in a principled way and can unify most existing link-based ranking algorithms. The basic idea of probabilistic relevance propagation is to first compute a content-based relevance probability score for each document using the query, and then propagate the probabilities through different groups of neighbors. We exploit the content information as a basis for finding the probability of the relevance of a document to a query and use the link structure to define different groups of neighbors to propagate the probabilities through. After propagation, unlike [29], our model gives us a probabilistic score for each document defined based on a probabilistic surfing model. Moreover, our model supports using multiple types of neighbors, which is shown to outperform the results of using a single type of neighbor. On the other hand, the probabilistic interpretation of the model suggests that we should transfer the contentbased retrieval scores to probabilities of relevance, which is shown to be beneficial in our experiments. The probabilistic interpretation also provides guidance on how to set various parameters in the propagation model. In some sense, our work resembles previous work on spreading activation [7, 33, 12, 13, 34, 15, 28, 35, 23, 11] as both involve propagating values through a network/graph. The main difference, however, is that in these spreading activation methods, the number of steps for propagating the weights is predefined and is a small value in most of the cases, while our framework is an iterative process which iterates until the ranks converge to a limit. We derive several special instances of the general probabilistic relevance propagation framework and show that probabilistic relevance propagation is a very general mechanism that allows us to recover most of the major existing algorithms as special cases. Moreover, it also naturally suggests several new algorithms that can combine content and link information. In our experiments, we evaluated several propagation algorithms and the experiment results show that: (1) Using relevance propagation to combine link information and content information for scoring can improve retrieval accuracy over using only content for scoring. (2) Using multiple sets of neighbors for propagation outperforms using a single neighbor set. (3) Using probabilities to control the effect of different groups of neighbors helps. (4) Using probabilities to control the influence of each document in a neighbor set helps. The rest of the paper is organized as follows: We present our relevance propagation framework and derive several special cases in section 2. We discuss the experiment results in section 3 and conclude in section 4 finally. 2. A PROBABILISTIC RELEVANCE PROPAGATION FRAMEWORK Given a query, intuitively, a good result document is one whose content is related to the query topic and which is surrounded by other good documents; i.e. located in the center of a subset of the collection relevant to the query. Thus in order to maximize ranking accuracy, we need to consider the relevance of the document to the query as well as the relevance of its neighbors. In this section, we propose a general probabilistic relevance propagation framework for combining relevance values of different groups of neighbors in a principled way. The basic idea of probabilistic relevance propagation is to first use the query to compute a contentbased self relevance probability score for each document and then propagate the scores through the neighbors. 2.1 Probabilistic Relevance Propagation Framework In this framework, we allow different types of neighbors to influence the quality score of a document. Figure 1 shows a sample... Nk d N2 Figure 1: A typical documents d and its neighbors document d surrounded by k different groups of neighbors. The neighbor sets are not necessarily mutually exclusive. In-links, Outlinks, the single document itself and the whole set of documents are a few examples of potential neighbor sets. Think of a random surfer surfing the Web looking for documents related to a given query q. At each step, the surfer being in a document, selects a group of neighbors surrounding the document and jumps to a document in that group. The surfer keeps doing this iteratively, jumping to neighbor documents looking for documents relevant to query q. The final score of each page is equal to the stationary probability of the surfer visiting the page. Formally, the probability of the surfer being in each document is defined as: p(x) = k i=1 k α i = 1, i=1 α i d D x D N N1 p(d)p i(d x) p i(d x) = 1 where D is the set of all documents, α i indicates the probability of choosing a particular type of neighbor set when leaving the current

3 document and p i is the probability of visiting a particular page in the chosen neighbor set. Note that p i(a b) is positive only if b is a neighbor of a, otherwise p i(a b) =. In order to compute the scores, for each neighbor set, we construct a matrix M i where M i(m, n) = p i(d m d n). The probability scores can be computed using matrix multiplication: P = M T P where P is the vector of the probability values and M = Σ k i=1α im i. The probability values are computed iteratively through matrix multiplications in a very similar way as any of the existing link-based scoring algorithms. Clearly, efficient matrix multiplication methods can be used to further speed up the scoring. The final scores will be the values of the stationary probability distribution. To ensure reachability to each document, we generally would include the whole set of documents as one special set of neighbors in our propagation framework. Thus by the Ergodicity theorem for Markov chains [17], we know that the Markov chain defined by such a transition matrix M must have a unique stationary probability distribution. In this framework, we identify three groups of probabilistic parameters: Hyper-Relevance Probability p(d): Defined for each document indicating the probability of visiting the document. Neighbor Set Selection Probability α: Defined for each group of neighbors indicating the probability of choosing a particular type of neighbor set when leaving the current document. Navigation Probability p i(d x): Defined for each document in a specific group indicating the probability of visiting a particular page in the chosen neighbor set. 2.2 Special Cases By setting α is to different values and instantiating p i s with specific functions, we can easily obtain many special cases of our general relevance propagation framework. In particular, the framework can recover most existing link-based algorithms. Table 1 shows a family of relevance propagation algorithms which are covered by our general framework. As can be seen from the table, PageRank and its extensions are special cases of the framework. The HITS algorithm is not directly a special case, since it does not satisfy the probability property. But with minor changes, i.e. normalization of the weights, it will be a special case. In this table, we include the normalized version of HITS as well as the normalized version of its extensions. 2.3 Parameter Estimation In our framework, we have identified three groups of probabilistic parameters: content relevance probabilities, neighbor set selection probabilities and navigation probabilities. The content relevance probability of a document can be estimated based on its relevance score given by any content-based retrieval method. Neighbor set selection probabilities and navigation probabilities can be either set to uniform or estimated based on content-based relevance scores. In this subsection, we show how to estimate the parameters. We compare different estimation methods in the following section Content Relevance Probabilities In our probabilistic framework we should convert the original content scores to probabilities. The specific conversion method is inevitably dependent on the specific content scoring method, but with some training data, we may use techniques such as logistic regression to do the conversion. If the original retrieval model is a probabilistic model, we have some natural analytical way to transform the scores. As an example of this transformation, here we show how we compute relevance probabilities from Okapi scores and from language model(lm) scores. Okapi Having the Okapi scores, our goal is to normalize the scores to find the relevance probabilities of the documents to the query. We use logistic regression to normalize the scores [31]: The Okapi score is a X + b, where X is the log odds of relevance, i.e., log(p(rel)/(1 p(rel)). So, to recover the probability p(rel), we have p(rel) = exp(x)/(1 + exp(x)). Given a score s, we have s = ax + b, or X = (s b)/a. Thus, the normalization formula should be p(rel) = exp((s b)/a)/(1+exp((s b)/a)). In order to set a and b, we assume that the minimum score min corresponds to a very small probability δ. We also assume that the maximum score max corresponds to p(rel) =. Solving these equations will give us values for a and b: min max a = log( δ ) log( ) 1 δ 1 max log δ b = log 1 δ min log δ 1 δ log Language Modeling Approach 1 1 In the language modeling approach, we score a document D w.r.t. a query Q by s = log p(q D) [42]. Thus we can do an exponential transformation to recover the probability of relevance. That is, p(rel) p(q D)p(D) exp(s) (assuming uniform p(d)) Neighbor Set Selection Probabilities The easiest way to estimate neighbor set selection probabilities is uniform estimation, counting all the neighbor sets to be equal, i.e. α i = 1 k. But obviously this is not the best we can do. Our framework suggests to use relevance scores for Neighbor set probability estimation. We get our intuition for defining neighbor set selection probabilities from the surfer model. In the surfer model, in each step, the surfer should decide on the neighbor set he wants to jump to. Intuitively, the surfer will select the neighbor set based on the average relevance of the documents in the neighbor set, the higher the average relevance, the more probable the surfer will select that group. Using this intuition, we set α i using: α i 1 N i X Ni rel(x), α i = Navigation Probabilities Like neighbor set selection probabilities, the navigation probabilities are most easily estimated through uniform estimation. But intuitively, estimating the probabilities using relevance values should give better results. We define navigation probabilities based on the content relevance probabilities of target pages. The higher the probability of the relevance of the target page, the higher the probability of navigating the link: p(d x) p(x). 2.4 Summary The proposed framework provides a general probabilistic interpretation of relevance-based propagation through multiple sets of neighbors. It can unify most existing link-based ranking algorithms, making it possible to compare the assumptions made in each specific algorithm. It also makes it possible to systematically explore

4 k D d k Table 1: Probabilistic Relevance Propagation Algorithms Method k Neighbors α i s p i s N : Set of all documents α > const. P (d x) = 1 N 1 PageRank[27] 2 N I : Set of In-links α I > const. P I (d x) = OUT (d) 1 Topic-Sensitive PR[18] 2 N : Set of all documents α > const. P (d x) = C j if d in ODPC1 (c j ) o.w. 1 N I : Set of In-links α I > const. P I (d x) = OUT (d) Rel(x) Intelligent Surfer[3] 2 N : Set of all documents α > const. P (d x) = Rel(k) Rel(x) N I : Set of In-links α I > const. P I (d x) = Rel(k) Authorities 1 N CC : Set of Co-Citations α CC = 1 P CC (d x) IN(d) if d = x Normalized HITS[2] #Common Parents o.w. Hubs 1 N CR : Set of Co-References α CR = 1 P CR (d x) OUT(d) if d = x #Common Children o.w. Authorities 1 N CC : Set of Co-Citations α CC = 1 P CC (d x) Normalized Rel(x) IN(d) if d = x #Common Parents o.w. Weighted HITS[2] Hubs 1 N CR : Set of Co-References α CR = 1 P CR (d x) OUT(d) if d = x Rel(x) #Common Children o.w. Authorities 1 N CC : Set of Co-Citations α CC = 1 P CC (d x) IN(d) if d = x Normalized ARC[5] Rel(anchor(x)) #Common Parents o.w. Hubs 1 N CR : Set of Co-References α CR = 1 P CR (d x) OUT(d) if d = x Rel(anchor(x)) #Common Children o.w. Authorities 2 N : Set of all documents α > const. P (d x) = 1 N IN(d) if d = x Normalized N CC : Set of Co-Citations α CC > const. P CC (d x) #Common Parents o.w. Randomized HITS[26] Hubs 2 N : Set of all documents α > const. P (d x) = 1 N OUT(d) if d = x N CR : Set of Co-References α CR > const. P CR (d x) #Common Children o.w. the algorithm space and compare different components of algorithms. Moreover, taking a strict probabilistic view of propagation provides guidance on how to normalize content scores and how to set other propagation parameters to optimize retrieval accuracy, as will be shown later in the paper. 3. COMPARISON OF RELEVANCE PROPAGATION ALGORITHMS We have done some experiments to evaluate the performance of our proposed models. In this section, we present our experiment results. 3.1 Experiment Design Data Set and Methods As our data set, we used the.gov test collection, which is an 18 gigabyte, 1.25 million document 22 partial crawl of the.gov domain used in TREC-22, TREC-23 and TREC-24 experiments for topic distillation [8, 9, 1]. We used two sets of queries in our experiments: (1) 5 topic distillation topics created by NIST for TREC-23 and (2) 75 topic distillation topics created by NIST for TREC-24. The topics are keyword queries for which key resources exist within the.gov collection. An important advantage of using this set of data is that it is created carefully for the purpose of evaluating Web retrieval algorithms with a significant number of judgments available for quantitatively comparing different methods. In our experiments, we used two baseline methods: Okapi and Language Modeling approach. Since our exploration is orthogonal to the use of anchor text and many other heuristics which are known to improve the performance, we preferred not to enter these heuristics in our baseline. Despite this, we already have a very strong baseline compared to the reported results in TREC23 [9] and TREC24 [1]. We expect the performance to be further improved when we use other heuristics on top of our method Neighbor Sets In our experiments, we compare the performance of using two types of neighbors: The set of documents which have links to the document(in) and the set of documents which are linked from the document(out). There also exist a universal neighbor set N which contains all the documents in the collection. Selecting this universal neighbor set to jump to is equivalent to jumping to a random page Content Relevance Probabilities The probabilistic relevance propagation framework allows us to use any content-based retrieval algorithm from which we can com-

5 pute the relevance probabilities. In our experiments, we try two baseline methods: Okapi and language modeling approach and compute the relevance probabilities from these relevance scores Neighbor Set Selection and Navigation Probabilities As mentioned earlier, α is are the parameters which indicate the probability of choosing a particular type of neighbor when leaving the current document. In our experiments, we follow one of the two approaches: either manually set α i to different values from to 1 or automatically set α i using neighbor set selection probabilities. Navigation probabilities on the other hand indicate the probability of visiting a particular page in a group. In our experiments, we use two different estimations of these probabilities: Uniform estimation ( Uni ) and relevance based estimation ( Wt ). 3.2 Result Analysis Effectiveness of Exploiting Link Information The first research question we want to answer is whether applying probabilistic relevance propagation on top of a content-based retrieval method would improve the performance. Most existing studies of link-based scoring algorithms focus on comparing different link-based algorithms without comparing link-based algorithms with scoring using only contents. The Web Track of TREC has seen some evaluation of effectiveness of exploiting link information to improve content-based scoring, but the results are not quite conclusive due to the many uncontrolled factors. To answer the first research question, we compare the performance of using two types of neighbors: in-links and out-links. (Note that we also have the universal neighbor set). We will have: p(x) = α Σ d D p(d)p (d x) + α IΣ d IN p(d)p I(d x) + α OΣ d OUT p(d)p O(d x) where α is the probability of randomly jumping to a page, α I is the probability of jumping to an in-link and α O is the probability of jumping to an out-link. Jumping probabilities can either be uniform (considering all the members to be equal) or weighted based on relevance probabilities. We also consider the combination of the two types of neighbors. This gives us eight combinations, which we compare with the content-only baseline in tables 2 and 3. The numbers in parenthesis show the percent of improvement over the baseline methods. We use precision at 1 and average precision for comparison. The shown results are the best performances achieved by these methods through tuning the parameter α manually in the probabilistic relevance propagation model; we will analyze the sensitivity later. From tables 2 and 3, we can make the following observations: 1. On both query sets, both types of neighbors can outperform the baseline significantly. 2. Weighted propagation of probabilities outperform uniform propagation. 3. The combination of different types of neighbors outperform using any single neighbor set. 4. We get significant improvement using both Okapi and LM baselines. Overall, we see that the probabilistic relevance propagation framework is reasonable and all these specific derived algorithms can help improve search results. Table 2: Combining Link and Content - Okapi Method TREC-23 Prec@1 Avg. Prec 8 1 Uni-IN 18(9.3%) 5(2%) Wt-IN 8(18.5%) 4(19%) Uni-OUT 18(9.3%) 51(24.8%) Wt-OUT 2(13%) 63(34.7%) Uni-IN Uni-OUT 6(16.7%) 68(38.8%) Uni-IN Wt-OUT 32(22.2%) 66(37.2%) Wt-IN Uni-OUT 38(27.8%) 73(43%) Wt-IN Wt-OUT 38(27.8%) 79(47.9%) TREC-24 Prec@1 Avg. Prec 9.93 Uni-IN 8(39.5%) 5(34.4%) Wt-IN 81(4.3%) 5(34.4%) Uni-OUT 56(2.9%) 12(2.4%) Wt-OUT 57(21.7%) 13(21.5%) Uni-IN Uni-OUT 79(38.8%) 4(33.3%) Uni-IN Wt-OUT 84(42.6%) 7(36.6%) Wt-IN Uni-OUT 8(39.5%) 5(34.4%) Wt-IN Wt-OUT 88(45.7%) 7(36.6%) Effectiveness of Combining Different Groups of Neighbors In tables 4 and 5, we compare the results of using only one type of neighbor with the results when we consider multiple groups of neighbors. We did a Wilcoxon signed rank test to see if the improvement on Average Precision is statistically significant. In these tables we compare the best results for each type of neighbor. Statistically significant improvements are distinguished by a star( ). As the tables show, combining different groups of neighbors improves the performance over using a single set of neighbors. Potentially, we can improve the performance by adding new types of neighbors, e.g. co-citations (documents which have at least one common parent with the document) and co-references (documents which have at least one common child with the document) Content Score Transformation The probabilistic framework suggests that we should convert the original content scores to probabilities. In our experiments, we use Okapi and LM methods as our baseline and transform the scores to probabilities using logistic regression and exponential transformation respectively. Figures 2 and 3 compare the performance of probabilistic transformation with the performance of the original raw score propagation as done in all the previous work. As can be seen, the performance is much better when we use probabilistic transformation Comparison of Estimation Methods Relevance-Based Estimate of α Improves over Uniform Estimate. In one set of experiments, we tried to set αs automatically based on the average relevance values of neighbors. Table 6 compares the results of relevance-based estimation of α with uniform estimation. As the table shows, in most of the cases relevance-based estimation gives better results. Note that these results are completely automatic; i.e. we do not have to tune any parameters. Thus these improvements are very encouraging. These results also confirm that using multiple neighbor sets improves over using just a single neighbor set.

6 Table 3: Combining Link and Content - LM Method TREC-23 Prec@1 Avg. Prec Uni-IN 18(28.3%) 35(36.4%) Wt-IN 18(28.3%) 2(43.4%) Uni-OUT 6(15.2%) 9(3.3%) Wt-OUT 1(19.6%) 35(36.3%) Uni-IN Uni-OUT 18(28.3%) 5(51.5%) Uni-IN Wt-OUT 2(32.6%) 2(43.4%) Wt-IN Uni-OUT 6(37%) 5(46.5%) Wt-IN Wt-OUT 8(39.1%) 4(45.5%) TREC-24 Prec@1 Avg. Prec 9.95 Uni-IN 65(27.9%) 13(18.9%) Wt-IN 67(29.5%) 15(21.1%) Uni-OUT 1(9.3%) 7(12.6%) Wt-OUT 4(11.6%) 1(15.8%) Uni-IN Uni-OUT 6(24%) 15(2.8%) Uni-IN Wt-OUT 63(26.4%) 16(21.1%) Wt-IN Uni-OUT 64(27.1%) 17(23.2%) Wt-IN Wt-OUT 67(29.5%) 19(25.3%) Relevance-Based Estimate of p i(d x) Improves over Uniform Estimate. In table 7, we compare the results of uniformly setting the navigation weights vs. estimating them based on relevance scores. As the table shows, relevance based estimation improves the performance in most of the cases Sensitivity Analysis We have so far only looked at the best performance using each method. We now turn to the question about how sensitive each method is to the setting of the parameter α, which controls the amount of influence from the neighbors. To answer this research question, we compute an optimal range of parameter values for each method, which is defined as the interval of parameter values for which a method outperforms the baseline. Table 8 shows the optimal ranges for four of our algorithms. We see that, in general, the optimal range is wide for most methods, indicating that exploiting these groups of neighbors for relevance propagation is useful. The uniform methods are generally more sensitive to the setting of α, which indicates that using weighted methods is more robust. In figures 2 and 3, we show the complete picture of the sensitivity of these methods with prec@1 and average precision. 4. CONCLUSIONS In this paper, we proposed a general probabilistic relevance propagation framework for combining content and link information in a principled manner to fully take advantage of query-based content scoring and link structures. The framework can unify most existing link-based ranking algorithms and can also suggest several interesting new algorithms through different propagation strategies. Following the probabilistic relevance propagation framework, we systematically compared eight specific relevance propagation models on two TREC test collections for Web retrieval. Our results show that all the eight relevance propagation models that we tested can outperform the baseline content only ranking method for a wide range of parameter values, indicating that Table 4: Using Multiple Neighbors vs. a Single Neighbor Set- TREC-23 Multi Neighbor Sets Single Neighbor Set Improvement Okapi Uni-IN Uni-OUT Uni-IN % * 68 Uni-OUT % Uni-IN Wt-OUT Uni-IN % * 66 Wt-OUT % Wt-IN Uni-OUT Wt-IN 4 2% * 73 Uni-OUT % Wt-IN Wt-OUT Wt-IN % * 79 Wt-OUT % LM Uni-IN Uni-OUT Uni-IN % * 5 Uni-OUT % Uni-IN Wt-OUT Uni-IN % * 2 Wt-OUT % Wt-IN Uni-OUT Wt-IN 2 2.1% 5 Uni-OUT % * Wt-IN Wt-OUT Wt-IN 2 1.4% 4 Wt-OUT % * Table 5: Using Multiple Neighbors vs. a Single Neighbor Set- TREC-24 Multi Neighbor Sets Single Neighbor Set Improvement Okapi Uni-IN Uni-OUT Uni-IN 5-4 Uni-OUT % * Uni-IN Wt-OUT Uni-IN 5.8% 6 Wt-OUT % Wt-IN Uni-OUT Wt-IN 5-5 Uni-OUT % * Wt-IN Wt-OUT Wt-IN 5 1.6% 7 Wt-OUT % * LM Uni-IN Uni-OUT Uni-IN % 15 Uni-OUT 7 7.5% * Uni-IN Wt-OUT Uni-IN % 16 Wt-OUT 1 5.5% * Wt-IN Uni-OUT Wt-IN % 17 Uni-OUT 7 9.3% * Wt-IN Wt-OUT Wt-IN % 19 Wt-OUT 1 8.1% * the relevance propagation framework provides a general, effective and robust way of exploiting link information to improve hypertext search accuracy. While the previous work all uses just one type of neighbor for propagation, we have shown that using multiple neighbor sets outperforms using just one type of neighbors significantly. We have also shown that taking a probabilistic view of propagation provides guidance on setting propagation parameters, that using content scores to estimate the probabilities of relevance improves the performance and that relevance based estimation of the parameters helps us improve the results. There are several interesting directions for further research: 1. Our framework naturally accommodates the use of anchor text through estimating navigation parameters based on anchor text. It is interesting to see how this estimation compares with our current estimation methods. 2. We have shown that in-links and out-links are useful for relevance propagation and can outperform the content only

7 Precision at 1 Precision at Average Precision Average Precision (a) Results using relevance probabilities (b) Results using raw content scores Figure 2: TREC-23 results with Okapi baseline Table 6: Effectiveness of Neighbor Set Selection Probability Estimation (α) Method Uniform Estimate Relevance Estimate Prec@1 Avg Prec Prec@1 Avg Prec Uni-IN Wt-IN Uni-OUT Wt-OUT Uni-IN Uni-OUT Uni-IN Wt-OUT Wt-IN Uni-OUT Wt-IN Wt-OUT Table 7: Navigation Probability Estimation - TREC-23 Okapi Neighbor Uniform Estimate Relevance Estimate (Impr.) Set Prec@1 Avg Prec Prec@1 Avg Prec IN (8.5%) 4(-) OUT (3.4%) 63(8.1%) IN & OUT (9.5%) 79(6.5%) LM Neighbor Uniform Estimate Relevance Estimate (Impr.) Set Prec@1 Avg Prec Prec@1 Avg Prec IN (-) 2(5.2%) OUT 6 9 1(3.8%) 35(4.7%) IN & OUT (8.5%) 4(-) baseline. It would be interesting to try other kinds of neighbors, e.g. co-citations and co-references to see if they can further improve the performance. 3. Other than the neighbor sets derived from the explicit link structure of the Web, we can also define other types of neighbors. In general, the framework allows us to define any set of documents with a specific characteristic as a neighbor set. As an example, we can define the set of pages with simi- lar content as a neighbor set [21]. It is interesting to see if exploiting these types of neighbors can further improve the retrieval accuracy. 4. The probabilistic relevance propagation framework is a general hypertext retrieval framework that can be applicable to any hypertext retrieval environment. For example, we may apply the algorithms we studied here to literature search where the links represent citations.

8 Precision at 1 Precision at Average Precision Average Precision (a) Results using relevance probabilities (b) Results using raw content scores Figure 3: TREC-23 results with LM baseline Table 8: Ranges of α Values for Improving s Okapi Method TREC-23 TREC-24 1 Avg. Prec 1 Avg. Prec Uni-IN [.6,.9] [.6,.9] [.3,.9] [.3,.9] Wt-IN [.3,.9] [.3,.9] [.3,.9] [.2,.9] Uni-OUT [.4,.9] [.4,.9] [.6,.9] [.4,.9] Wt-OUT [.2,.9] [.2,.9] [.3,.9] [.2,.9] Language Model Uni-IN [.5,.9] [.6,.9] [.6,.9] [.6,.9] Wt-IN [.3,.9] [.2,.9] [.3,.9] [.4,.9] Uni-OUT [.6,.9] [.3,.9] [.7,.9] [.6,.9] Wt-OUT [.5,.9] [,.9] [.6,.9] [.3,.9] 5. ACKNOWLEDGMENTS This work is in part supported by the National Science Foundation under award numbers , , and REFERENCES [1] V. Anh and A. Moffat. Melbourne university 24: Terabyte and web tracks. In Proceedings of the TREC Conference, 24. [2] K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In SIGIR 98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM Press. [3] K. Bharat and G. A. Mihaila. When experts agree: using non-affiliated experts to rank popular topics. ACM Trans. Inf. Syst., 2(1):47 58, 22. [4] A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Inter. Tech., 5(1): , 25. [5] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, [6] S. Chakrabarti, B. Dom, D. Gibson, S. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In ACM SIGIR workshop on Hypertext Information Retrieval on the Web, [7] P. R. Cohen and R. Kjeldsen. Information retrieval by constrained spreading activation in semantic networks. Inf. Process. Manage., 23(4): , 1987.

9 [8] N. Craswell and D. Hawking. Overview of the trec-22 web track. In Proceedings of the TREC Conference, 22. [9] N. Craswell and D. Hawking. Overview of the trec-23 web track. In Proceedings of the TREC Conference, 23. [1] N. Craswell and D. Hawking. Overview of the trec-24 web track. In Proceedings of the TREC Conference, 24. [11] F. Crestani and P. L. Lee. Searching the web by constrained spreading activation. Inf. Process. Manage., 36(4):585 65, 2. [12] W. B. Croft, T. J. Lucia, and P. R. Cohen. Retrieving documents by plausible inference: a priliminary study. In SIGIR 88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM Press. [13] W. B. Croft, T. J. Lucia, J. Cringean, and P. Willett. Retrieving documents by plausible inference: an experimental study. Inf. Process. Manage., 25(6): , [14] B. D. Davison. Toward a unification of text and link analysis. In SIGIR 3: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages , New York, NY, USA, 23. ACM Press. [15] H. P. Frei and D. Stieger. The use of semantic links in hypertext information retrieval. Inf. Process. Manage., 31(1):1 13, [16] E. Garfield. Citation indexes for science. Science, 129, [17] G. Grimmett and D. Stirzaker. Probability and random processes. In Oxford University Press, [18] T. H. Haveliwala. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4): , 23. [19] J. Kamps, G. Mishne, and M. de Rijke. Language models for searching in web corpora. In Proceedings of the TREC Conference, 24. [2] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):64 632, [21] O. Kurland and L. Lee. Pagerank without hyperlinks: structural re-ranking using links induced by language models. In Proceedings of ACM SIGIR 25, pages , 25. [22] R. Larson. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Annual Meeting of the American Society of Information Science, [23] M. Marchiori. The quest for correct information on the Web: Hyper search engines. Computer Networks and ISDN Systems, 29(8 13): , [24] F. Mathieu and M. Bouklit. The effect of the back button is a random walk: Application for pagerank. In Proceedings of the 13th International World Wide Web Conference, 24. [25] D. S. Modha and W. S. Spangler. Clustering hypertext with applications to web searching. In HYPERTEXT : Proceedings of the eleventh ACM on Hypertext and hypermedia, pages , New York, NY, USA, 2. ACM Press. [26] A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In SIGIR 1: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, 21. ACM Press. [27] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library, [28] P. Pirolli, J. Pitkow, and R. Rao. Silk from a sow s ear: Extracting usable structures from the web. In Proc. ACM Conf. Human Factors in Computing Systems, CHI. ACM Press, [29] T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR 5: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, 25. ACM Press. [3] M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 22. [31] S. Robertson. Threshold setting and performance optimization in adaptive filtering. Information Retrieval, 5(2-3): , 22. [32] G. Salton. Associative document retrieval techniques using bibliographic information. J. ACM, 1(4):44 457, [33] G. Salton and C. Buckley. On the use of spreading activation methods in automatic information retrieval. Technical report, Ithaca, NY, USA, [34] J. Savoy. Bayesian inference networks and spreading activation in hypertext systems. Inf. Process. Manage., 28(3):389 46, [35] J. Savoy. Ranking schemes in hybrid boolean systems: a new approach. J. Am. Soc. Inf. Sci., 48(3): , [36] A. Shakery and C. Zhai. Relevance propagation for topic distillation uiuc trec 23 web track experiments. In Proceedings of the TREC Conference, 23. [37] H. Small. Co-citation in the scientific literature: a new measure of the relationship between two documents. The American Society of Information Science, 24, [38] R. Song, J. R. Wen, S. M. Shi, T. Y. Xin, G. M. abd Liu, T. Qin, X. Zheng, J. Y. Zhang, G. R. Xue, and W. Y. Ma. Microsoft research asia at web track and terabyte track of trec 24. In Proceedings of the TREC Conference, 24. [39] M. Sydow. Random surfer with back step. In Proceedings of the 13th International World Wide Web Conference, 24. [4] T. Tomiyama, K. Karoji, T. Kondo, Y. Kakuta, and T. Takagi. Meiji university web, novelty and genomics track experiments. In Proceedings of the TREC Conference, 24. [41] H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft cambridge at trec 13: Web and hard tracks. In Proceedings of the TREC Conference, 24. [42] C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM 1: Proceedings of the tenth international conference on Information and knowledge management, pages 43 41, New York, NY, USA, 21. ACM Press. [43] Z. Zhou, Y. Guo, B. Wang, X. Cheng, H. Xu, and G. Zhang. Trec 24 web track experiments at cas-ict. In Proceedings of the TREC Conference, 24.

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer