Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank

Size: px

Start display at page:

Download "Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank"

Brook Williamson
5 years ago
Views:

1 Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi etrieval Effectiveness and Learning to ank EMIE YILMAZ Professor and Turing Fellow University College London esearch Consultant Microsoft esearch

2 LEAIG TO AK (LT)... the task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. - Liu [2009] LT models represent a rankable item e.g., a document given some context e.g., a user-issued query as a numerical vector Ԧx n The ranking model f: Ԧx is trained to map the vector to a real-valued score such that relevant items are scored higher. Tie-Yan Liu. Learning to rank for information retrieval. Foundation and Trends in Information etrieval, 2009.

3 APPLICATIOS OF LEAIG TO AK Search (Document Search, Entity Search, etc.) ecommender Systems (Collaborative Filtering) Question Answering Document Summarization Opinion Mining Machine Translation

4 APPLICATIOS OF LEAIG TO AK Search (Document Search, Entity Search, etc.) ecommender Systems (Collaborative Filtering) Question Answering Document Summarization Opinion Mining Machine Translation

5 LEAIG TO AK Search Engine BM25,tf*idf, Pageank, esults Evaluate Judges

6 FEATUES Traditional learning to rank models employ hand-crafted features that encode insights about the problem They can often be categorized as: Query-independent or static features e.g., incoming link count, page-rank score Query-dependent or dynamic features e.g., BM25

7 FEATUES Traditional learning to rank models employ hand-crafted features that encode insights about the problem They can often be categorized as: Query-independent or static features e.g., incoming link count, page-rank score More semantic representations in the recent years Query-dependent or dynamic features e.g., BM25

8 TEM EMBEDDIGS FO SEACH estimate relevance estimate relevance Compare query and document directly in the embedding space Use embeddings to generate suitable query expansions

9 APPOACHES Pointwise approach elevance label y q,d is a number derived from binary or graded human judgments or implicit user feedback (e.g., CT). Typically, a regression or Liu [2009] categorizes different LT approaches based on training objectives: classification model is trained to predict y q,d given Ԧx q,d. Pairwise approach Pairwise preference between documents for a query (d i d j w.r.t. q) as label. educes to binary classification to predict more relevant document. Listwise approach (Modern Systems) Directly optimize for rank-based metrics evaluating user satisfaction (more to be discussed later) Tie-Yan Liu. Learning to rank for information retrieval. Foundation and Trends in Information etrieval, 2009.

10 EVALUATIO METICS: PECISIO VS. ECALL etrieved list Ṛ.

11 VISUALIZIG ETIEVAL PEFOMACE: PECISIO-ECALL CUVES List:

12 EVALUATIO METICS: AVEAGE PECISIO List:

13 EVALUATIO METICS: DCG Some documents more relevant than others User receives some gain from each document Discount gain based on rank DCG r 1 G( r) D( r) ormalized discounted cumulative gain DCG DCG OptDCG G( r) rel( r),2 D( r) rel( r), ,,... log ( r) r b

14 EVALUATIO METICS Two categories User-oriented metrics PC(k), DCG(k) System-oriented metrics AP, DCG

15 APPOACHES Pointwise approach elevance label y q,d is a number derived from binary or graded human judgments or implicit user feedback (e.g., CT). Typically, a regression or classification model is trained to predict y q,d given Ԧx q,d. Pairwise approach Pairwise preference between documents for a query (d i d j w.r.t. q) as label. educes to binary classification to predict more relevant document. Listwise approach (Modern Systems) Directly optimize for rank-based metric, such as DCG difficult because these metrics are often not differentiable w.r.t. model parameters. Tie-Yan Liu. Learning to rank for information retrieval. Foundation and Trends in Information etrieval, 2009.

s i s j score Computing cross-entropy between p and pҧ L anket = pҧ ij. log p ij p ji ҧ.

16 PAIWISE OBJECTIVES 0 1 anket loss Pairwise loss function proposed by Burges et al. [2005] an industry favourite [Burges, 2015] pairwise preference Predicted probabilities: p ij = p s i > s j Desired probabilities: pҧ ij = 1 and p ji ҧ = e γ. s i s j score Computing cross-entropy between p and pҧ L anket = pҧ ij. log p ij p ji ҧ. log p ji Use neural network as the model, and gradient descent as the algorithm, to optimize the cross-entropy loss. Chris Burges, Tal Shaked, Erin enshaw, Ari Lazier, Matt Deeds, icole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In ICML, Chris Burges. anket: A ranking retrospective

17 LISTWISE OBJECTIVES Blue: relevant Gray: non-relevant DCG higher for left but pairwise errors less for right Due to strong position-based discounting in I measures, errors at higher ranks are much more problematic than at lower ranks But listwise metrics are non-continuous and non-differentiable [Burges, 2010] Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 2010.

18 LISTWISE OBJECTIVES: LAMBDAAK Burges et al. [2010] make two observations: 1. To train a model we don t need the costs themselves, only the gradients (of the costs w.r.t model scores) Lambdaank loss Multiply actual gradients with the change in DCG by swapping the rank positions of the two documents 2. It is desired that the gradient be bigger for pairs of documents that produces a bigger impact in DCG by swapping positions Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 2010.

19 LISTWISE OBJECTIVES: LAMBDAAK Empirically shown to optimize the objective metric Most current learning to rank models based on different variations of the idea Winner of the Yahoo! Learning to ank Challenge Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 2010.

20 LEAIG TO AK Search Engine BM25,tf*idf, Pageank, esults Many different evaluation metrics Which metric? Evaluate Judges

21 OPTIMIZIG FO A METIC Empirical isk Minimization X evaluates user satisfaction Optimize for X A common misconception! Informative vs. uninformative metrics

22 TAIIG WITH MOE IFOMATIO Train for a more informative metric X : user satisfaction

23 TAIIG WITH MOE IFOMATIO Train for a more informative metric X : user satisfaction Y : more informative than X

24 TAIIG WITH MOE IFOMATIO Train for a more informative metric X : user satisfaction Y : more informative than X Train for Y! Better test set X than training for X!

25 WHAT MAKES A METIC MOE IFOMATIVE? List 1 List 2.. Metrics respond to flips in the same way Some metrics may ignore some flips Metrics sensitive to many flips more informative Metrics weight flips differently Some metrics give too much weight to some flips

26 IFOMATIVEESS AD LEAIG TO AK Evaluation metrics summarize the training data Query Learning Algorithm list d1 d2 d3. d4

27 IFOMATIVEESS AD LEAIG TO AK Evaluation metrics summarize the training data Query Learning Algorithm list d1 d2 d3. d4.

28 IFOMATIVEESS AD LEAIG TO AK Evaluation metrics summarize the training data Query Learning Algorithm list d1 d2 d3. d4. Evaluate

29 IFOMATIVEESS AD LEAIG TO AK Evaluation metrics summarize the training data Query Learning Algorithm list d1 d2 d3. d4???.? Evaluate

30 EVALUATIO METICS AD IFOMATIVEESS For the same part of the ranking Some metrics insensitive to some flips at this part Two lists with identical PC(10) values

31 EVALUATIO METICS AD IFOMATIVEESS Some metrics ignore some parts of ranking Two lists with identical PC(5) and DCG(5) values

32 IFOMATIVEESS OF METICS [YILMAZ AD OBETSO, IJ 10] Quality of a ranked list elevance of documents in the list How much does a metric reduce one s uncertainty in the underlying list? Informative metrics: large reduction in uncertainty on-informative metrics: little or no reduction in uncertainty

33 IFOMATIVEESS OF METICS Evaluation Metric Maximum Entropy Framework elevance Precision-recall Curve Other Metrics Pr(rel rank) 1 Pr() 2 Pr() 3 Pr()

34 SETUP FO AP METIC Goal: Given the average precision value (ap) of a list, infer probability of relevance of document at rank i Maximum entropy setup: Maximize Subject to

35 IFOMATIVEESS OF METICS

36 IFOMATIVEESS OF METICS

37 WHICH EVALUATIO METIC? Which evaluation metric? Informative Metrics: AP, DCG Less Informative Metrics: DCG(10), PC(10) DCG(10) more informative than PC(10) AP vs. DCG AP more informative than DCG regarding binary relevance of documents

38 IFOMATIVEESS AD LEAIG TO AK Hypothesis: Optimizing for a more informative metric Y gives better test set X than optimizing for X directly Learning algorithms Lambdaank Softank (Optimize for smooth versions of metrics) Evaluation Metrics Informative Metrics: AP, DCG Less Informative Metrics: DCG(10), PC(10)

39 TAIIG O DIFFEET METICS: LAMBDAAK Optim. Metric TestMetric AP PC(10) DCG(10) AP * PC(10) DCG * 62.90* DCG(10)

40 TAIIG O DIFFEET METICS: SOFTAK Optim. Metric TestMetric AP PC(10) DCG(10) AP * 63.03* PC(10) DCG * 62.98* DCG(10) * 62.41

41 SUMMAY Be careful about which metric you use! Optimize for informative metrics Similar conclusions in classification Informative metric design Graded Average Precision What is the ultimate metric for learning to rank?

42 CUET ESEACH: TASK BASED I Task: Invest in Bitcoins Find cryptocurrencies exchanges to buy/sell Find trading fee details Taxation rules for crypto Task: Buy a paintbrush ecommend: painting courses Buy painting guides Find combo offers Users use online systems to achieve some real world tasks

43 CUET ESEACH: TASK BASED I Task: Invest in Bitcoins Find cryptocurrencies 25 Page Views per task exchanges to buy/sell Find trading fee details Taxation rules for crypto 44 Minutes per task Task: Buy a paintbrush ecommend: painting courses Buy painting guides Find combo offers 7 Queries per task Users use online systems to achieve some real world tasks Significant effort required using existing systems

44 CUET ESEACH: TASK BASED I Task: Invest in Bitcoins Find cryptocurrencies 25 Page Views per task exchanges to buy/sell Find trading fee details Taxation rules for crypto 44 Minutes per task Task: Buy a paintbrush ecommend: painting courses Buy painting guides Find combo offers 7 Queries per task Devise next generation intelligent online services than can Go beyond the input from the user Automatically detect the task the user trying to achieve Provide the user with contextual task completion assistance

45 ESEACH CHALLEGES FO TASK BASED I SYSTEMS Task extraction/representation Design of task based retrieval interfaces Task based personalization Task based evaluation of retrieval systems

46 Usage logs (questions asked, queries issued, pages viewed, etc.) contain information about tasks users use the online systems for Usage Logs Mine information from usage logs using machine learning techniques to infer the representations of tasks Subtasks Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian onparametric Approach.. Mehrotra and E. Yilmaz. In Proceedings of ACM SIGI Deep Sequential Models for Task Satisfaction Prediction. Mehrotra, E. Yilmaz et al. In Proceedings of ACM CIKM Task Embeddings: Learning Query Embeddings using Task Context.. Mehrotra and E. Yilmaz. In Proceedings of ACM CIKM 2017.

47 MOE O CUET/FUTUE WOK Conversational I system design and evaluation Stance detection (fake news detection) Understanding user behaviour across different devices AI for education Predicting cryptocurrency price change using resources from the web

48 Thank You! Dr. Manisha Verma Dr. Jiyin He Bhaskar Mitra Sahan Bulathwela Dr. ishabh Mehrotra Dr. Shangsong Liang Qiang Zhang Andrew Burnie

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles