Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Similar documents
WebSci and Learning to Rank for IR

An Empirical Analysis on Point-wise Machine Learning Techniques using Regression Trees for Web-search Ranking

Parallel Boosted Regression Trees for Web Search Ranking

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract)

Chapter 8. Evaluating Search Engine

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Retrieval Evaluation. Hongning Wang

Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER

Learning to Rank: A New Technology for Text Processing

Information Retrieval

Ranking and Learning. Table of Content. Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification.

Using Machine Learning to Optimize Storage Systems

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Multi-Task Learning for Boosting with Application to Web Search Ranking

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Combining SVMs with Various Feature Selection Strategies

Graph Algorithms Maximum Flow Applications

Evaluation of Retrieval Systems

Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Multi-label classification using rule-based classifier systems

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

ASCERTAINING THE RELEVANCE MODEL OF A WEB SEARCH-ENGINE BIPIN SURESH

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

ImageNet Classification with Deep Convolutional Neural Networks

Evaluation of Retrieval Systems

Machine Learning In Search Quality At. Painting by G.Troshkov

A Survey on Postive and Unlabelled Learning

Information Retrieval

Ranking with Query-Dependent Loss for Web Search

Search Engines and Learning to Rank

CSCI 5417 Information Retrieval Systems. Jim Martin!

Scaling Up Decision Tree Ensembles

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Evaluation of Retrieval Systems

Advances on the Development of Evaluation Measures. Ben Carterette Evangelos Kanoulas Emine Yilmaz

Parallel Boosted Regression Trees for Web Search Ranking

Personalized Web Search

Learning to Rank for Information Retrieval

Lecture 20: Bagging, Random Forests, Boosting

Document indexing, similarities and retrieval in large scale text collections

A Dynamic Bayesian Network Click Model for Web Search Ranking

Information Retrieval

CSC 411 Lecture 4: Ensembles I

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Session Based Click Features for Recency Ranking

arxiv: v1 [cs.ir] 19 Sep 2016

Part 7: Evaluation of IR Systems Francesco Ricci

Stat 342 Exam 3 Fall 2014

Entity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University

Structured Ranking Learning using Cumulative Distribution Networks

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

One-Pass Ranking Models for Low-Latency Product Recommendations

Gradient Boosted Feature Selection. Zhixiang (Eddie) Xu, Gao Huang, Kilian Q. Weinberger, Alice X. Zheng

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn

1 Training/Validation/Testing

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CSE 546 Machine Learning, Autumn 2013 Homework 2

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

A Comparative Analysis of Cascade Measures for Novelty and Diversity

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Predictive Indexing for Fast Search

Automatic Domain Partitioning for Multi-Domain Learning

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

CS145: INTRODUCTION TO DATA MINING

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Performance Measures for Multi-Graded Relevance

Machine Learning / Jan 27, 2010

Predictive Indexing for Fast Search

Machine Learning for Professional Tennis Match Prediction and Betting

Bayesian model ensembling using meta-trained recurrent neural networks

A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank

Facial Expression Classification with Random Filters Feature Extraction

Semi-supervised Learning

Support Vector Machines: Brief Overview" November 2011 CPSC 352

CS 237: Probability in Computing

Information Retrieval

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

10-701/15-781, Fall 2006, Final

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

High Accuracy Retrieval with Multiple Nested Ranker

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

A new click model for relevance prediction in web search

Handling Ties. Analysis of Ties in Input and Output Data of Rankings

Learning to Rank with Deep Neural Networks

Lina Guzman, DIRECTV

node2vec: Scalable Feature Learning for Networks

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang

Learning and Evaluating Classifiers under Sample Selection Bias

Tutorials Case studies

Transcription:

in in Florida State University November 17, 2017

Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms

in Part I:

s in our life in Wine If I have K brands of wine of some type, like sauvignon blanc, how to rank them?

in sports in Tennis Who is the best tennis player in this year? ATP s WTA s

in Tennis players in

Web pages search in

models: overview in learning and statistical models for model: based on pairwise evaluations. Web search model: the webpages/documents according to their relevance to query.

in Part II: model

model in Let P ab denote the probability that a is preferred to b. Suppose P ab + P ba = 1 for all pairs; that is, we assume a tie cannot occur. model: Alternatively, P ab log( P ab ) = log( ) = β a β b P ba 1 P ab P ab = exp(β a )/(exp(β a ) + exp(β b )) Thus, P ab = 1 2 when β a = β b and P ab > 1 2 when β a > β b. ( ) I Residual df = (I 1) 2

model in Assumption: the samples of evaluation are independent, and the evaluations for different pairs are also independent. We can use logistic methods to fit the model.

Example 1: Major League Baseball s in Table: Results of 2011 Season for American League (Eastern Division) Baseball Teams Losing Team Winning Team Boston New York Tampa Bay Toronto Baltimore Boston - 12 6 10 10 New York 6-9 11 13 Tampa Bay 12 9-12 9 Toronto 8 7 6-12 Baltimore 8 5 9 6 - Data source: Agresti, Alan, and Maria Kateri. Categorical data analysis

model in Table: Results of Fitting Bradley-Terry Model to Baseball Data Team Winning Percentage ˆβ a SE Boston 52.8 0.454 0.304 New York 54.2 0.499 0.305 Tampa Bay 58.3 0.635 0.307 Toronto 45.8 0.229 0.303 Baltimore 38.9 0.000 -

R output in

Example 2: Tennis player in Table: Head-to-head records of players in the ATP top 20 (update to 10/30/2017) Loser Winner Rafael Nadal Roger Federer Andy Murray Novak Djokovic Stanislas Wawrinka Rafael Nadal - 23 17 24 16 Roger Federer 15-14 22 20 Andy Murray 7 11-11 10 Novak Djokovic 26 23 25-20 Stanislas Wawrinka 3 3 8 5 - data source: http://www.tennisabstract.com/

R output for tennis players in

Model extension: home team advantage in Let Pab denote the probability that team a beats team b, when a is the home team. Consider the logistic model P ab log( 1 Pab ) = α + β a β b, when α > 0, a home field advantage exists.

Part III: webpage search model in Outline Problem description and model introduction Global V.S. subset Data structure Query&webpage pair and feature vector Example: to rank competition Data analysis GBRT igbrt

Problem description in In general, we want to rank a set of documents/webpages according to their relevance to a given query. In machine learning communities, learning to rank is a supervised learning.

machine learning framework in function h(x). Extract a set of feature vectors x for each query-document pair, and train a function h(x). Rank the documents/web-pages using the value of h(x). For x with a larger value in h(x), we can say that x is ranked higher.

Global and subset in In the aforementioned model, given a query, we will rank all the documents in the training dataset. However, in application, we do not need to rank all documents for a given query. Thus, the subset model may be more popular in application.

Subset model: web search example in Filtering procedure: when the search engine system takes a query, it will use a simple algorithm for initial filtering, which limits the s to an initial pool {p j } of size m (e.g., m = 100000). Here, {p j } is returned wed page, j = 1, 2,, m. After this initial, the system use a more complicated algorithm to reorder the s in the pool.

Data Structure in Table: A example of training data structure Query Documents Feature Vector Score q1 p1 x1 1 y1 1 p2 x2 1 y2 1 pm1 xm1 1 ym1 1 q2 p1 x1 2 y1 2 p2 x2 2 y2 2 pm2 xm2 2 ym2 2 q3

Data description in The training data can be formally represented as: {(x q j, y q j )}, where q goes from 1 to n, the number of queries, j goes from 1 to m q, the number of documents for query q. x q j R p is a p-dimensional feature vector for the pair of query q and the j th document for this query, y q j is the relevance label for x q j.

Data description: features in The main categories for feature x: Web graph, Document statistics, Document classifier, Query, Text match, Topical matching, Click, External references, Time. The grade y indicates the degree of relevance of this document to its corresponding query. For example, each grade can be one element in the ordinal set, {perfect, excellent, good, fair, bad} and is labeled by human editors.

Example: datasets in learning to rank competition in Table: Datasets released for the challenge dataset1 datasets Train Valid Test Train Valid Test Queries 19944 2994 6983 1266 1266 3798 Documents 473134 71083 165660 34815 34881 103174 Features 519 596 Table: Distribution of relevance labels Grade Label dataset1 dataset2 Perfect 4 1.67% 1.89% Excellent 3 3.88% 7.67% Good 2 22.30% 28.55% Fair 1 50.22% 35.80% Bad 0 21.92% 26.09%

Example: datasets in learning to rank competition in Figure: The number of documents associated with each query.

Data analysis: overview in There are so many algorithms. Generally, current algorithms can be divided into three categories, according to different objective functions optimization: Pointwise: a regression loss or a classification loss. Pairwise: a pairwise loss function. q m q i,j,y q i >y q j l(h(x q i ) h(x q j )) Listwise: The loss function is defined over all the documents associated with query.

Model evaluation criteria in The Discounted Cumulative Gain (DCG) has been widely used to assess relevance in the context of search engines. A simple variation of DCG: DCG m = m G j /log(j + 1) j=1 where G j represents the weights assigned to the label of the document at position j. We also have expected reciprocal rank (ERR) and NDCG as model evaluation criteria.

Data analysis: learning algorithms in In this presentation, we focus on point-wise methods. Gradient Boosted Regression Trees (GBRT) is a very powerful tool to solve s search. igbrt: the initial residual of GBRT r i = y i F (x i ), F (x i ) is the estimator of RandomForests.

Regression vs. Classification in In practice, classification is better than regression. In classification, instead of training a function h(x i ) y i, we generate binary classification s, such as c = 1, 2, 3, 4 for y {0, 1, 2, 3, 4}. The c th classification predicts if the document is less relevant than c, i.e., y i < c. For each of these binary classification s, we train a classifier h c ( ): h c ( ) = P(rel(x i ) < c). In this example, we also define h 0 ( ) = 0 and h 5 ( ) = 1. Thus, we can combine all classifiers h 0, h 1,, h 5 to compute the probability of each class.

Regression vs. Classification in In our example, we compute the probability that a document x i has a relevance of r {0, 1, 2, 3, 4} And P(rel(x i ) = r) = P(rel(x i ) < r + 1) P(rel(x i ) < r) = h r+1 (x i ) h r (x i )

Results for data analysis in Table: Performance of GBRT, RF,iGBRT. All results are evaluated in ERR and NDCG. ERR method Regr./Class. dataset1 dataset2 GBRT R 0.45304 0.45669 RF R 0.46349 0.46212 igbrt R 0.46301 0.46303 GBRT C 0.45448 0.46008 RF C 0.46308 0.46200 igbrt C 0.46360 0.46246 NDCG method Regr./Class. dataset1 dataset2 GBRT R 0.76991 0.76587 RF R 0.79575 0.77552 igbrt R 0.79575 0.77725 GBRT C 0.77246 0.77132 RF C 0.79544 0.77373 igbrt C 0.79672 0.77591

Future study in There are many important topics do not cover in this presentation. algorithms about pairwise and listwise. Other models, like recommendation system. theory about learning to rank. Online learning, etc.

Thank you in Reference Chapelle, Olivier, and Yi Chang. Yahoo! learning to rank challenge overview. Proceedings of the to Rank Challenge. 2011. Cossock, David, and Tong Zhang. Statistical analysis of Bayes optimal subset. IEEE Transactions on Information Theory 54.11 (2008): 5140-5154. Chapelle, Olivier, Yi Chang, and T-Y. Liu. Future directions in learning to rank. Proceedings of the to Rank Challenge. 2011. Li, Ping, Qiang Wu, and Christopher J. Burges. Mcrank: to rank using multiple classification and gradient boosting. Advances in neural information processing systems. 2008.

Thank you in Mohan, Ananth, Zheng Chen, and Kilian Weinberger. Web-search with initialized gradient boosted regression trees. Proceedings of the to Rank Challenge. 2011. Zheng, Zhaohui, et al. A general boosting method and its application to learning functions for web search. Advances in neural information processing systems. 2008.