Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Size: px

Start display at page:

Download "Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction"

Britney Murphy
6 years ago
Views:

1 in in Florida State University November 17, 2017

2 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms

3 in Part I:

4 s in our life in Wine If I have K brands of wine of some type, like sauvignon blanc, how to rank them?

5 in sports in Tennis Who is the best tennis player in this year? ATP s WTA s

6 in Tennis players in

7 Web pages search in

8 models: overview in learning and statistical models for model: based on pairwise evaluations. Web search model: the webpages/documents according to their relevance to query.

9 in Part II: model

10 model in Let P ab denote the probability that a is preferred to b. Suppose P ab + P ba = 1 for all pairs; that is, we assume a tie cannot occur. model: Alternatively, P ab log( P ab ) = log( ) = β a β b P ba 1 P ab P ab = exp(β a )/(exp(β a ) + exp(β b )) Thus, P ab = 1 2 when β a = β b and P ab > 1 2 when β a > β b. ( ) I Residual df = (I 1) 2

11 model in Assumption: the samples of evaluation are independent, and the evaluations for different pairs are also independent. We can use logistic methods to fit the model.

12 Example 1: Major League Baseball s in Table: Results of 2011 Season for American League (Eastern Division) Baseball Teams Losing Team Winning Team Boston New York Tampa Bay Toronto Baltimore Boston New York Tampa Bay Toronto Baltimore Data source: Agresti, Alan, and Maria Kateri. Categorical data analysis

13 model in Table: Results of Fitting Bradley-Terry Model to Baseball Data Team Winning Percentage ˆβ a SE Boston New York Tampa Bay Toronto Baltimore

14 R output in

15 Example 2: Tennis player in Table: Head-to-head records of players in the ATP top 20 (update to 10/30/2017) Loser Winner Rafael Nadal Roger Federer Andy Murray Novak Djokovic Stanislas Wawrinka Rafael Nadal Roger Federer Andy Murray Novak Djokovic Stanislas Wawrinka data source:

16 R output for tennis players in

17 Model extension: home team advantage in Let Pab denote the probability that team a beats team b, when a is the home team. Consider the logistic model P ab log( 1 Pab ) = α + β a β b, when α > 0, a home field advantage exists.

18 Part III: webpage search model in Outline Problem description and model introduction Global V.S. subset Data structure Query&webpage pair and feature vector Example: to rank competition Data analysis GBRT igbrt

19 Problem description in In general, we want to rank a set of documents/webpages according to their relevance to a given query. In machine learning communities, learning to rank is a supervised learning.

20 machine learning framework in function h(x). Extract a set of feature vectors x for each query-document pair, and train a function h(x). Rank the documents/web-pages using the value of h(x). For x with a larger value in h(x), we can say that x is ranked higher.

21 Global and subset in In the aforementioned model, given a query, we will rank all the documents in the training dataset. However, in application, we do not need to rank all documents for a given query. Thus, the subset model may be more popular in application.

22 Subset model: web search example in Filtering procedure: when the search engine system takes a query, it will use a simple algorithm for initial filtering, which limits the s to an initial pool {p j } of size m (e.g., m = ). Here, {p j } is returned wed page, j = 1, 2,, m. After this initial, the system use a more complicated algorithm to reorder the s in the pool.

23 Data Structure in Table: A example of training data structure Query Documents Feature Vector Score q1 p1 x1 1 y1 1 p2 x2 1 y2 1 pm1 xm1 1 ym1 1 q2 p1 x1 2 y1 2 p2 x2 2 y2 2 pm2 xm2 2 ym2 2 q3

24 Data description in The training data can be formally represented as: {(x q j, y q j )}, where q goes from 1 to n, the number of queries, j goes from 1 to m q, the number of documents for query q. x q j R p is a p-dimensional feature vector for the pair of query q and the j th document for this query, y q j is the relevance label for x q j.

25 Data description: features in The main categories for feature x: Web graph, Document statistics, Document classifier, Query, Text match, Topical matching, Click, External references, Time. The grade y indicates the degree of relevance of this document to its corresponding query. For example, each grade can be one element in the ordinal set, {perfect, excellent, good, fair, bad} and is labeled by human editors.

26 Example: datasets in learning to rank competition in Table: Datasets released for the challenge dataset1 datasets Train Valid Test Train Valid Test Queries Documents Features Table: Distribution of relevance labels Grade Label dataset1 dataset2 Perfect % 1.89% Excellent % 7.67% Good % 28.55% Fair % 35.80% Bad % 26.09%

27 Example: datasets in learning to rank competition in Figure: The number of documents associated with each query.

28 Data analysis: overview in There are so many algorithms. Generally, current algorithms can be divided into three categories, according to different objective functions optimization: Pointwise: a regression loss or a classification loss. Pairwise: a pairwise loss function. q m q i,j,y q i >y q j l(h(x q i ) h(x q j )) Listwise: The loss function is defined over all the documents associated with query.

29 Model evaluation criteria in The Discounted Cumulative Gain (DCG) has been widely used to assess relevance in the context of search engines. A simple variation of DCG: DCG m = m G j /log(j + 1) j=1 where G j represents the weights assigned to the label of the document at position j. We also have expected reciprocal rank (ERR) and NDCG as model evaluation criteria.

30 Data analysis: learning algorithms in In this presentation, we focus on point-wise methods. Gradient Boosted Regression Trees (GBRT) is a very powerful tool to solve s search. igbrt: the initial residual of GBRT r i = y i F (x i ), F (x i ) is the estimator of RandomForests.

31 Regression vs. Classification in In practice, classification is better than regression. In classification, instead of training a function h(x i ) y i, we generate binary classification s, such as c = 1, 2, 3, 4 for y {0, 1, 2, 3, 4}. The c th classification predicts if the document is less relevant than c, i.e., y i < c. For each of these binary classification s, we train a classifier h c ( ): h c ( ) = P(rel(x i ) < c). In this example, we also define h 0 ( ) = 0 and h 5 ( ) = 1. Thus, we can combine all classifiers h 0, h 1,, h 5 to compute the probability of each class.

32 Regression vs. Classification in In our example, we compute the probability that a document x i has a relevance of r {0, 1, 2, 3, 4} And P(rel(x i ) = r) = P(rel(x i ) < r + 1) P(rel(x i ) < r) = h r+1 (x i ) h r (x i )

33 Results for data analysis in Table: Performance of GBRT, RF,iGBRT. All results are evaluated in ERR and NDCG. ERR method Regr./Class. dataset1 dataset2 GBRT R RF R igbrt R GBRT C RF C igbrt C NDCG method Regr./Class. dataset1 dataset2 GBRT R RF R igbrt R GBRT C RF C igbrt C

34 Future study in There are many important topics do not cover in this presentation. algorithms about pairwise and listwise. Other models, like recommendation system. theory about learning to rank. Online learning, etc.

35 Thank you in Reference Chapelle, Olivier, and Yi Chang. Yahoo! learning to rank challenge overview. Proceedings of the to Rank Challenge Cossock, David, and Tong Zhang. Statistical analysis of Bayes optimal subset. IEEE Transactions on Information Theory (2008): Chapelle, Olivier, Yi Chang, and T-Y. Liu. Future directions in learning to rank. Proceedings of the to Rank Challenge Li, Ping, Qiang Wu, and Christopher J. Burges. Mcrank: to rank using multiple classification and gradient boosting. Advances in neural information processing systems

36 Thank you in Mohan, Ananth, Zheng Chen, and Kilian Weinberger. Web-search with initialized gradient boosted regression trees. Proceedings of the to Rank Challenge Zheng, Zhaohui, et al. A general boosting method and its application to learning functions for web search. Advances in neural information processing systems

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles