Information Retrieval

Size: px

Start display at page:

Download "Information Retrieval"

Garry Curtis
6 years ago
Views:

1 Information Retrieval Learning to Rank Ilya Markov University of Amsterdam Ilya Markov Information Retrieval 1

2 Course overview Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 2

3 This lecture Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 3

4 Outline 1 Current trends in IR 2 Ilya Markov i.markov@uva.nl Information Retrieval 4

5 IR conferences ACM Conference on Research and Development in Information Retrieval (SIGIR) ACM Conference on Information Knowledge and Management (CIKM) ACM Conference on Web Search and Data Mining (WSDM) European Conference on Information Retrieval (ECIR) Ilya Markov Information Retrieval 5

6 IR journals ACM Transactions o Information Systems (TOIS) Information Retrieval Journal (IRJ) Information Processing and Management (IPM) Ilya Markov i.markov@uva.nl Information Retrieval 6

7 Surveys Foundations and Trends in Information Retrieval (FnTIR) Synthesis Lectures on Information Concepts, Retrieval, and Services by Morgan&Claypool Publishers Ilya Markov Information Retrieval 7

8 SIGIR 2016 Evaluation Efficiency Retrieval models, learning-to-rank, web search Users, user needs, search behavior Novelty and diversity Speech, conversation systems, question answering Recommendation systems Entities and knowledge graphs Ilya Markov Information Retrieval 8

9 WSDM 2016 Communities, social interaction, social networks Search and semantics Observing users, leveraging users Big data algorithms Entities and structure Ilya Markov Information Retrieval 9

10 Ranking methods 1 Content-based Term-based Semantic 2 Link-based (web search) 3 Ilya Markov i.markov@uva.nl Information Retrieval 10

11 Outline 1 Current trends in IR 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 11

12 Machine learning Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. Input {x i } n i=1 Output {y i } n i=1 Learn a model h(x) that optimizes a loss function L(h(x), y) For a new instance x new predict the output y = h(x new ) Ilya Markov i.markov@uva.nl Information Retrieval 12

13 the training data. This is also highly demanding for real search engines, because everyday these search engines will receive a lot of user feedback and usage logs indicating poor ranking for some queries or documents. It is very important to automatically learn from feedback and constantly improve the ranking mechanism. Due to the aforementioned two characteristics, learning to rank has been widely used in commercial search engines, 13 and has also attracted The aimgreat of LTR attention isfrom to the come academic up research with optimal community. ordering of items, Figure 1.1 shows the typical learning-to-rank flow. From the figure where the relative ordering among the items is more important we can see that since learning to rank is a kind of supervised learning, a training thanset the is needed. exact Thescore creationthat of a training eachset item is verygets. similar to T.-Y. Liu, Learning to Rank for Information Retrieval Fig. 1.1 Learning-to-rank framework. Ilya Markov i.markov@uva.nl Information Retrieval 13

14 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 14

15 Machine learning Input {x i } n i=1 Output {y i } n i=1 Learn a model h(x) that optimizes a loss function L(h(x), y) Examples Linear model h(x i ) = w T x i = lk=1 w kx ik Quadratic loss function L(h(x i ), y i ) = h(x i ) y i 2 How to learn the model h(x), i.e., how to estimate its parameters? Ilya Markov i.markov@uva.nl Information Retrieval 15

16 Learning the model h(x) If there is a closed form solution for the parameters of h(x) 1 Compute the derivative of the loss function L with respect to some parameter w k 2 L Equate this derivative to zero: w k = 0 3 Find the optimal value of the parameter w k If there is no closed form solution, use gradient descent 1 Compute or approximate the gradient of the loss function [ L = parameters L w 1,..., L w l ] using the current values of the 2 Update the model parameters by taking a small step in the opposite direction of the gradient: w w η L Ilya Markov i.markov@uva.nl Information Retrieval 16

Gradient descent Picture taken from https://en.wikipedia.

17 Gradient descent Picture taken from Ilya Markov Information Retrieval 17

18 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 18

19 been widely used in commercial search engines, 13 and has also attracted great attention from the academic research community. Figure 1.1 shows the typical learning-to-rank flow. From the figure we can see that since learning to rank is a kind of supervised learning, a training set is needed. The creation of a training set is very similar to Fig. 1.1 Learning-to-rank T.-Y. Liu, framework. Learning to Rank for Information Retrieval 13 See Ilya Markov i.markov@uva.nl Information Retrieval 19

20 Query-document representation Each query-document pair (q (n), x i ) is represented as a vector of features x (n) i Features Content-based Link-based User-based = [x (n) i1, x (n) i2,..., x (n) il ] Ilya Markov i.markov@uva.nl Information Retrieval 20

21 document pair, as shown in Table 6.2. For the OHSUMED corpus, 40 features were extracted in total, as shown in Table 6.3. Content-based features Table 6.2 Learning features of TREC. ID Feature description 1 Term frequency (TF) of body 2 TF of anchor 3 TF of title 4 TF of URL 5 TF of whole document 6 Inverse document frequency (IDF) of body 7 IDF of anchor 8 IDF of title 9 IDF of URL 10 IDF of whole document 11 TF*IDF of body 12 TF*IDF of anchor 13 TF*IDF of title 14 TF*IDF of URL 15 TF*IDF of whole document 16 Document length (DL) of body 17 DL of anchor 18 DL of title 19 DL of URL 20 DL of whole document 21 BM25 of body 22 BM25 of anchor 23 BM25 of title 24 BM25 of URL 25 BM25 of whole document T.-Y. 26 LMIR.ABS Liu, Learning of body to Rank for Information Retrieval 27 LMIR.ABS of anchor 28 LMIR.ABS of title Ilya Markov 29 i.markov@uva.nl LMIR.ABS of URL Information Retrieval 21

22 Link-based features 6.1 The LETOR Collection 291 Table 6.2 (Continued) ID Feature description 40 LMIR.JM of whole document 41 Sitemap based term propagation 42 Sitemap based score propagation 43 Hyperlink base score propagation: weighted in-link 44 Hyperlink base score propagation: weighted out-link 45 Hyperlink base score propagation: uniform out-link 46 Hyperlink base feature propagation: weighted in-link 47 Hyperlink base feature propagation: weighted out-link 48 Hyperlink base feature propagation: uniform out-link 49 HITS authority 50 HITS hub 51 PageRank 52 HostRank 53 Topical PageRank 54 Topical HITS authority 55 Topical HITS hub 56 Inlink number 57 Outlink number 58 Number of slash in URL 59 Length of URL 60 Number of child page 61 BM25 of extracted title 62 LMIR.ABS of extracted title 63 T.-Y. LMIR.DIR Liu, Learning of extracted to Ranktitle for Information Retrieval 64 LMIR.JM of extracted title Table 6.3 Learning features of OHSUMED. Ilya Markov Information Retrieval 22

23 User-based features Type of interaction Clicks Time Queries Online metric Click-through rate for (q (n), x i ) Avg. click rank for (q (n), x i ) Avg. dwell time for (q (n), x i ) Avg. time to first click, when this click is on x i Avg. time to last click, when this click is on x i Number of reformulations before/after q (n) Number of times q (n) is abandoned Ilya Markov i.markov@uva.nl Information Retrieval 23

24 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 24

25 been widely used in commercial search engines, 13 and has also attracted great attention from the academic research community. Figure 1.1 shows the typical learning-to-rank flow. From the figure we can see that since learning to rank is a kind of supervised learning, a training set is needed. The creation of a training set is very similar to Fig. 1.1 Learning-to-rank T.-Y. Liu, framework. Learning to Rank for Information Retrieval 13 See Ilya Markov i.markov@uva.nl Information Retrieval 25

26 approaches LambdaMART LambdaRank T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 26

27 Pointwise LTR query h() Rel(red) Rel(gray) Rel(orange) h(blue) h(yellow) Rel(red) h(green) query h() Re Re Re h(green) pointwise LTR h(white) h( pointwise LTR Ilya Markov Information Retrieval 27

28 Pointwise LTR Reduces to traditional ML Input: query-document feature vectors x (n) i = [x (n) i1, x (n) i2,..., x (n) il ] Output: relevance labels y i Objective: learn a model h(x) that correctly predicts labels y Ilya Markov i.markov@uva.nl Information Retrieval 28

29 Regression Picture taken from Ilya Markov Information Retrieval 29

30 Classification Picture taken from Ilya Markov Information Retrieval 30

31 icit constraints on the thresholds to the optimization problem. Current trends in IR cit constraint simply takes the form of b k 1 b k, while the imp traint Ordinal uses regression redundant training examples to guarantee the ord ionship among thresholds..2 Sum of margin strategy. T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov i.markov@uva.nl Information Retrieval 31

32 Pointwise LTR Pros + Intuitive interpretation of relevance + Clear, how to get relevance judgements Cons Has a different optimization objective compared to IR (e.g., finding a correct class) Ilya Markov i.markov@uva.nl Information Retrieval 32

33 Pairwise LTR query query Pref(red>gray) h() Pref(gray>green) g() h() Pref(green>red) h(red>blue) pairwise LTR pairwis Ilya Markov Information Retrieval 33

34 RankNet Pointwise scoring function f (x i ) with parameters {w k } l k=1 Pairwise ground-truth P ij = I(x i > x j ) Probability of x i > x j is modeled using logistic regression P ij = P(x i > x j ) = Pairwise loss function (cross entropy) e σ(f i f j ) C = P ij log P ij (1 P ij ) log(1 P ij ) = (1 P ij )σ(f i f j ) + log(1 + e σ(f i f j ) ) RankNet optimizes the total number of pairwise errors Ilya Markov i.markov@uva.nl Information Retrieval 34

35 RankNet (cont d) Optimize the cost C C f i = σ [ ] 1 (1 P ij ) 1 + e σ(f = C i f j ) f j Update parameter w k of the function f (x i ) ( C f i w k w k η + C ) f j f i w k f j w k Ilya Markov i.markov@uva.nl Information Retrieval 35

36 Speeding up RankNet training Define λ ij as λ ij = C f i [ ] 1 = σ (1 P ij ) 1 + e σ(f i f j ) Let I denote the set of pairs of indices {i, j}, for which x i should be ranked differently from x j for a given query q I = {i, j x i > x j } Sum all contributions to update parameter w k δw k = η ( ) f i f j λ ij λ ij w k w k {i,j} I = η f i λ ij w k i j:{i,j} I j:{j,i} I f i λ ij w k = η i λ i f i w k Ilya Markov i.markov@uva.nl Information Retrieval 36

37 Interpreting λ s λ i = λ ij j:{i,j} I j:{j,i} I λ ij λ i is a sum of forces applied to document x i shown for query q All documents x j, that should be ranked below x i, push it up with the force λ ij All documents x j, that should be ranked above x i, push it down with the force λ ij Ilya Markov i.markov@uva.nl Information Retrieval 37

38 Pairwise LTR Pros + Easy to get preference judgements + Comes closer to optimizing the ranking Cons Still does not optimize the whole ranking Higher computational complexity compared to pointwise LTR Ilya Markov i.markov@uva.nl Information Retrieval 38

39 Listwise LTR R(blue) R(yellow) R(red) R(green) R(white) q1 NDCG(q1) R(gray) R(red) R(orange) R(white) R(yellow) q2 h() NDCG(q2) q3 R(orange) R(green) R(blue) R(gray) R(red) NDCG(q3) q4 listwise LTR Ilya Markov Information Retrieval 39

40 From RankNet From RankNet to to LambdaRank LambdaRank to LambdaMART: An Overview 7 The black arrows denote the RankNet gradients (which increase Fig. 1 A set of urls ordered for a given query using a binary relevance measure. The light gray with the bars represent number urls that of are not pairwise relevant to theerrors) query, while the dark blue bars represent urls that are relevant to the query. Left: the total number of pairwise errors is thirteen. Right: by moving top RankNet url down cost three rank decreases levels, and thefrom bottom relevant 13 url onupthe five, theleft total number to 11 of pairwise on the errorsright has been reduced to eleven. However for IR measures like NDCG and ERR that emphasize the top The actual few results, ranking this is not what gets we want. worse The (black) arrows on the left denote the RankNet gradients (which increase with the number of pairwise errors), whereas what we d really like are the (red) The red arrows arrows on the right. is what we would actually like to see C. Burges, From RankNet to LambdaRank to LambdaMART: An Overview 4 LambdaRank Ilya Markov i.markov@uva.nl Information Retrieval 40

41 LambdaRank λ ij in RankNet [ ] 1 λ ij = σ (1 P ij ) 1 + e σ(f i f j ) λ ij in LambdaRank λ ij = σ 1 + e σ(f i f j ) NDCG NDCG = NDCG(orig. ranking) NDCG(x i and x j are swapped) LambdaRank directly uses the ranking to compute gradients (i.e., λ ij s) instead of computing and optimizing a cost function Ilya Markov i.markov@uva.nl Information Retrieval 41

42 LambdaRank (cont d) Proceed similarly to RankNet: Sum all λ ij s for document x i and query q λ i = λ ij j:{i,j} I j:{j,i} I Update parameter w k of the function f (x i ) λ ij w k w k η i λ i f i w k Ilya Markov i.markov@uva.nl Information Retrieval 42

43 LambdaMART Multiple Additive Regression Trees (MART) MART does not need a cost function but gradients Adopts gradients (λ ij s) from LambdaRank Hence the name: Lambda + MART Ilya Markov i.markov@uva.nl Information Retrieval 43

44 Listwise LTR Pros + Directly optimizes the whole ranking Cons Needs many judgements High computational complexity Ilya Markov i.markov@uva.nl Information Retrieval 44

45 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 45

46 LEarning TO Rank datasets (LETOR) Query-document pairs precomputed feature vectors Relevance judgements Ilya Markov Information Retrieval 46

47 document and judge whether it is relevant to a given query. Therefore, the pooling strategy as introduced in Section 1 was used [35]. Many research papers [97, 101, 131, 133] have been published using the three tasks on the Gov corpus as their experimental platform. Current trends in IR Historical LTR datasets TREC 2003, 2004 Web IR track Gov The corpus OHSUMED with 1, 053, corpus 110 pages Tasks The OHSUMED TD topic corpus distillation [64] is a subset of MEDLINE, a database on medical HP publications. homepageit finding consists of 348,566 records (out of over 7 million) from NP 270 named medical page finding journals during the years of Table 6.1 Number of queries in TREC web track. 4 Task TREC2003 TREC2004 Topic distillation Homepage finding Named page finding Figure: Number of queries T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov i.markov@uva.nl Information Retrieval 47

48 Results on NP2003 ListNet AdaRank SVM map le 6.7 Results on NP2003. Algorithm MAP Regression RankSVM RankBoost FRank ListNet AdaRank SVM map tp://svmrank.yisongyue.com/svmmap.php T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 48

49 FRank ListNet AdaRank SVM map Current trends in IR Results on HP2004 ble 6.10 Results on HP2004. Algorithm MAP Regression RankSVM RankBoost FRank ListNet AdaRank SVM map ble 6.11 Results on OHSUMED. Algorithm T.-Y. Liu, Learning to Rank for Information Retrieval MAP Regression RankSVM RankBoost FRank Ilya Markov Information Retrieval 49

50 Experimental comparison Listwise ranking algorithms perform very well on most datasets ListNet seems to be better than the others Pairwise ranking algorithms obtain good ranking accuracy on some (although not all) datasets Linear regression performs worse than the pairwise and listwise ranking algorithms T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 50

51 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 51

52 summary Features Content-based Link-based User-based Approaches Pointwise (regression, classification, ordinal regression) Pairwise (RankNet) Listwise (LambdaRank, LambdaMART) Ilya Markov Information Retrieval 52

53 Materials Tie-Yan Liu Learning to Rank for Information Retrieval Foundations and Trends in Information Retrieval, 2009 Christopher J.C. Burges From RankNet to LambdaRank to LambdaMART: An Overview Microsoft Research Technical Report, 2010 Ilya Markov Information Retrieval 53

54 Course overview Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 54

55 See you tomorrow! Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 55

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles