Learning to Rank: A New Technology for Text Processing

Similar documents
Machine Learning Approaches to Information Retrieval

A Few Things to Know about Machine Learning for Web Search

WebSci and Learning to Rank for IR

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Learning to Rank for Information Retrieval

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1

Information Retrieval

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract)

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Improving Recommendations Through. Re-Ranking Of Results

Context based Re-ranking of Web Documents (CReWD)

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Optimizing Search Engines using Click-through Data

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Chapter 8. Evaluating Search Engine

Performance Measures for Multi-Graded Relevance

A General Approximation Framework for Direct Optimization of Information Retrieval Measures

Ranking with Query-Dependent Loss for Web Search

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Learning to Rank for Information Retrieval and Natural Language Processing Second Edition

Learning Dense Models of Query Similarity from User Click Logs

A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising

Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences

Exploiting Label Dependency and Feature Similarity for Multi-Label Classification

Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Hierarchical Location and Topic Based Query Expansion

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Adapting Ranking Functions to User Preference

Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Web-based Question Answering: Revisiting AskMSR

Overview of the NTCIR-13 OpenLiveQ Task

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Structured Ranking Learning using Cumulative Distribution Networks

NUS-I2R: Learning a Combined System for Entity Linking

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

arxiv: v1 [cs.ir] 19 Sep 2016

Information Retrieval

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

A Deep Relevance Matching Model for Ad-hoc Retrieval

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Learning-to-rank with Prior Knowledge as Global Constraints

Information Retrieval

cocobean A Socially Ranked Question Answering System ethan deyoung jerry ye jimmy chen Srinivasan Ramaswamy advisor: marti hearst

Feature and Search Space Reduction for Label-Dependent Multi-label Classification

Automatic Discovery of Association Orders between Name and Aliases from The Web using Anchor Texts-Based Co-Occurrences

Learning to Rank Only Using Training Data from Related Domain

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines and Learning to Rank

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

Retrieval Evaluation. Hongning Wang

Relevance Feature Discovery for Text Mining

Classification and Comparative Analysis of Passage Retrieval Methods

A (very brief) overview of Information Retrieval. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

by the customer who is going to purchase the product.

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Learning Polynomial Functions. by Feature Construction

Beyond Content-Based Retrieval:

Fall Lecture 16: Learning-to-rank

Session Based Click Features for Recency Ranking

Multiple-Instance Ranking: Learning to Rank Images for Image Retrieval

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

Inferring User Search for Feedback Sessions

Improving Web Search Ranking by Incorporating User Behavior Information

A Framework for Web Host Quality Detection

Learning to Align Sequences: A Maximum-Margin Approach

Information Retrieval

Personalized Web Search

A New Technique to Optimize User s Browsing Session using Data Mining

WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

arxiv: v2 [cs.ir] 27 Feb 2019

Northeastern University in TREC 2009 Million Query Track

Automated Online News Classification with Personalization

Learning Locality-Preserving Discriminative Features

Part I: Data Mining Foundations

Overview of the NTCIR-13 OpenLiveQ Task

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Adjacency Matrix Based Full-Text Indexing Models

Handling missing values in kernel methods with application to microbiology data

Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs

Visualization and text mining of patent and non-patent data

CWS: : A Comparative Web Search System

Adaptive Search Engines Learning Ranking Functions with SVMs

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Multi-Instance Multi-Label Learning with Application to Scene Classification

arxiv: v1 [stat.ap] 14 Mar 2018

A Metric for Inferring User Search Goals in Search Engines

SQUINT SVM for Identification of Relevant Sections in Web Pages for Web Search

Extracting Search-Focused Key N-Grams for Relevance Ranking in Web Search

Transcription:

TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia

Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM for IR Summary

WHAT IS LEARNING TO RANK?

Ranking Problem: Example = Document Retrieval ocuments D,, 2, l ranking of ocuments query q Ranking System q, q,2 q, n q

Ranking in Information Retrieval Document Retrieval Collaborative Filtering Key Term Extraction Expert Fining

Means for Information Access Information Extraction Ranking Multi-ocument Summarization

NLP an TM Problems Can Be Formalize as Ranking Machine Translation Paraphrasing Sentiment Analysis

Learning to Rank q q m,,2 m, m,2 Learning System, n m, n m q m Ranking System m, m,2 m, n m

Score-base Ranking,,2, n q n m m m m m q,,2, Learning System Ranking System q m ), ( ), ( ), (,,,2,2,, m n m m m n m m m m m m m q f q f q f ), ( q f

Data Labeling Methos Listwise Pointwise Score Rank (e.g., relevant, partially relevant, irrelevant) Pairwise

Pairwise Data Labeling Metho Joachims (2002) ranking of ocuments A B click on ocument parwise ata C B > A

Evaluation Measures MRR (Mean Reciprocal Rank) MAP (Mean Average Precision) NDCG (Normalize Discounte Cumulative Gain) Kenall s Tau

Kenall s Tau 2 2P n( n Number of pairs Number of concorant pairs Example A B C C A B ) 2 3 3 2 n( n ) P

NDCG query: DCG at position m: NDCG at position m: average over queries Example q i (3, 3, 2, 2,,, ) rank r gain r( j) Ni n (2 ) / log( i j (7, 7, 3, 3,,, ) r 2 ( j ) (, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) iscount (7,.4, 2.9, 4.2, 4.59, 4.95, 5.28) m j (2 r( j) ) / log( j) m j) / log( j)

RANKING SVM

Ranking SVM Herbrich et al (2000) Input space: X Ranking function Ranking: x x i f : X R Linear ranking function: w, x () x (2) j 0 f f ( x ; w) f ( x ; w) Transforming to binary classification: ( x () x (2), z), z x x () (2) i ( x () x f ( x; w) w, x ; w) x (2) () j f ( x (2) ; w)

Ranking SVM (cont ) Ranking Moel: f(x;w) f ( x; w)

Ranking SVM (cont ) 0, 2 min (2) () 2, i i i i i i w x x w z C w 2 (2) (), min w x x w z l i i i i w

DEFINITION SEARCH

Definition Search (Xu 2005) What is Linux? Linux is an excellent prouct. Linux is a Unicoe platform. Linux is a free Unix-type operating system originally create by Linus Torvals with the assistance of evelopers aroun the worl. Linux is an open source operating system that was erive from Unix in 99. Linux is the platform for the communication application for the leaer network 20

Definitions Can Be Ranke Goo efinition Contain general notion an important properties of term E.g., Linux is a Unix-base operating system that was evelope in 99 by Linus Torvals, then a stuent in Finlan. Inifferent efinition Between goo an ba E.g., Linux is the best-known prouct istribute uner the GPL. Ba efinition Option, impression, or feeling of people about term E.g., Linux is an excellent prouct. 2

Extracting an Ranking Definition Caniates Ientifying term of each paragraph First Base NP of first sentence Two Base NPs separate by of or for Extracting efinition caniates <term> is a an the * <term>, *, a an the * <term> is one of * Ranking efinition caniates using Ranking SVM 22

Features in Ranking SVM. <term> occurs at the beginning of the paragraph. 2. <term> begins with the, a, or an. 3. All the wors in <term> begin with uppercase letters. 4. Paragraph contains preefine negative wors, e.g. he, she, sai 5. <term> contains pronouns. 6. <term> contains of, for, an, or or,. 7. <term> re-occurs in the paragraph. 8. <term> is followe by is a, is an or is the. 9. Number of sentences in the paragraph. 0. Number of wors in the paragraph.. Number of the ajectives in the paragraph. 2. Bag of wors: wors frequently occurring within a winow after <term> 23

Experimental Results on Ranking Definitional Paragraphs Error Rate R-precision Top Precision Top 3 Precision BM25 0.53 0.289 0.22 0.642 Ranom Ranking Ranking SVM 0.436 0.322 0.347 0.632 0.27 0.58 0.550 0.887 24

RANKING SVM FOR IR

Direct Application of Ranking SVM to Document Retrieval Query an ocument feature vector Combining instance pairs from all queries

Problems with Direct Application Top sensitiveness : efinitely relevant, p: partially relevant, n: not relevant ranking : p p n n n n ranking 2: p n p n n n Query normalization q: p p n n n n q2: p p p n n n n n q pairs: 2*(, p) + 4*(, n) + 8*(p, n) = 4 q2 pairs: 6*(, p) + 0*(, n) + 5*(p, n) = 3

New Loss function l () (2) min L( w) k ( i) q( i) zi w, xi x i w w i 2

Ranking SVM for IR (Cao et al 2006) () (2) 2 min L k ( i) q( i) zi w, xi x i w, w i 2 min M ( w) w Cii w 2 i () (2) subject to i 0, zi w, xi xi i i,, where C i k ( i) q( i) 2

Experimental Results (OHSUMED)

SUMMARY

Summary What is Learning to Rank? Ranking SVM Definition Search Ranking SVM for IR Summary

Future Directions Inventing new algorithms e.g., irectly optimizing evaluation measure Developing new labeling methos Fining more appropriate evaluation measures Applying to new applications

References Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large Margin Rank Bounaries for Orinal Regression.. Avances in Large Margin Classifiers (pp. 5-32). Joachims T. (2002), Optimizing Search Engines Using Clickthrough Data, Proceeings of the ACM Conference on Knowlege Discovery an Data Mining. Jun Xu, Yunbo Cao, Hang Li, an Min Zhao, Ranking Definitions with Supervise Learning Metho, Proc. of WWW 2005 inustry track, 8-89. Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, an Hsiao- Wuen Hon, Aapting Ranking SVM to Document Retrieval, Proc. of SIGIR 2006, 86-93. Yu-Ting Liu, Tie-Yan Liu, Tao Qin, Zhi-Ming Ma, Hang Li, Supervise Rank Aggregation, Proc. of WWW-2007, to appear, 2007.

THANK YOU