TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia
Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM for IR Summary
WHAT IS LEARNING TO RANK?
Ranking Problem: Example = Document Retrieval ocuments D,, 2, l ranking of ocuments query q Ranking System q, q,2 q, n q
Ranking in Information Retrieval Document Retrieval Collaborative Filtering Key Term Extraction Expert Fining
Means for Information Access Information Extraction Ranking Multi-ocument Summarization
NLP an TM Problems Can Be Formalize as Ranking Machine Translation Paraphrasing Sentiment Analysis
Learning to Rank q q m,,2 m, m,2 Learning System, n m, n m q m Ranking System m, m,2 m, n m
Score-base Ranking,,2, n q n m m m m m q,,2, Learning System Ranking System q m ), ( ), ( ), (,,,2,2,, m n m m m n m m m m m m m q f q f q f ), ( q f
Data Labeling Methos Listwise Pointwise Score Rank (e.g., relevant, partially relevant, irrelevant) Pairwise
Pairwise Data Labeling Metho Joachims (2002) ranking of ocuments A B click on ocument parwise ata C B > A
Evaluation Measures MRR (Mean Reciprocal Rank) MAP (Mean Average Precision) NDCG (Normalize Discounte Cumulative Gain) Kenall s Tau
Kenall s Tau 2 2P n( n Number of pairs Number of concorant pairs Example A B C C A B ) 2 3 3 2 n( n ) P
NDCG query: DCG at position m: NDCG at position m: average over queries Example q i (3, 3, 2, 2,,, ) rank r gain r( j) Ni n (2 ) / log( i j (7, 7, 3, 3,,, ) r 2 ( j ) (, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) iscount (7,.4, 2.9, 4.2, 4.59, 4.95, 5.28) m j (2 r( j) ) / log( j) m j) / log( j)
RANKING SVM
Ranking SVM Herbrich et al (2000) Input space: X Ranking function Ranking: x x i f : X R Linear ranking function: w, x () x (2) j 0 f f ( x ; w) f ( x ; w) Transforming to binary classification: ( x () x (2), z), z x x () (2) i ( x () x f ( x; w) w, x ; w) x (2) () j f ( x (2) ; w)
Ranking SVM (cont ) Ranking Moel: f(x;w) f ( x; w)
Ranking SVM (cont ) 0, 2 min (2) () 2, i i i i i i w x x w z C w 2 (2) (), min w x x w z l i i i i w
DEFINITION SEARCH
Definition Search (Xu 2005) What is Linux? Linux is an excellent prouct. Linux is a Unicoe platform. Linux is a free Unix-type operating system originally create by Linus Torvals with the assistance of evelopers aroun the worl. Linux is an open source operating system that was erive from Unix in 99. Linux is the platform for the communication application for the leaer network 20
Definitions Can Be Ranke Goo efinition Contain general notion an important properties of term E.g., Linux is a Unix-base operating system that was evelope in 99 by Linus Torvals, then a stuent in Finlan. Inifferent efinition Between goo an ba E.g., Linux is the best-known prouct istribute uner the GPL. Ba efinition Option, impression, or feeling of people about term E.g., Linux is an excellent prouct. 2
Extracting an Ranking Definition Caniates Ientifying term of each paragraph First Base NP of first sentence Two Base NPs separate by of or for Extracting efinition caniates <term> is a an the * <term>, *, a an the * <term> is one of * Ranking efinition caniates using Ranking SVM 22
Features in Ranking SVM. <term> occurs at the beginning of the paragraph. 2. <term> begins with the, a, or an. 3. All the wors in <term> begin with uppercase letters. 4. Paragraph contains preefine negative wors, e.g. he, she, sai 5. <term> contains pronouns. 6. <term> contains of, for, an, or or,. 7. <term> re-occurs in the paragraph. 8. <term> is followe by is a, is an or is the. 9. Number of sentences in the paragraph. 0. Number of wors in the paragraph.. Number of the ajectives in the paragraph. 2. Bag of wors: wors frequently occurring within a winow after <term> 23
Experimental Results on Ranking Definitional Paragraphs Error Rate R-precision Top Precision Top 3 Precision BM25 0.53 0.289 0.22 0.642 Ranom Ranking Ranking SVM 0.436 0.322 0.347 0.632 0.27 0.58 0.550 0.887 24
RANKING SVM FOR IR
Direct Application of Ranking SVM to Document Retrieval Query an ocument feature vector Combining instance pairs from all queries
Problems with Direct Application Top sensitiveness : efinitely relevant, p: partially relevant, n: not relevant ranking : p p n n n n ranking 2: p n p n n n Query normalization q: p p n n n n q2: p p p n n n n n q pairs: 2*(, p) + 4*(, n) + 8*(p, n) = 4 q2 pairs: 6*(, p) + 0*(, n) + 5*(p, n) = 3
New Loss function l () (2) min L( w) k ( i) q( i) zi w, xi x i w w i 2
Ranking SVM for IR (Cao et al 2006) () (2) 2 min L k ( i) q( i) zi w, xi x i w, w i 2 min M ( w) w Cii w 2 i () (2) subject to i 0, zi w, xi xi i i,, where C i k ( i) q( i) 2
Experimental Results (OHSUMED)
SUMMARY
Summary What is Learning to Rank? Ranking SVM Definition Search Ranking SVM for IR Summary
Future Directions Inventing new algorithms e.g., irectly optimizing evaluation measure Developing new labeling methos Fining more appropriate evaluation measures Applying to new applications
References Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large Margin Rank Bounaries for Orinal Regression.. Avances in Large Margin Classifiers (pp. 5-32). Joachims T. (2002), Optimizing Search Engines Using Clickthrough Data, Proceeings of the ACM Conference on Knowlege Discovery an Data Mining. Jun Xu, Yunbo Cao, Hang Li, an Min Zhao, Ranking Definitions with Supervise Learning Metho, Proc. of WWW 2005 inustry track, 8-89. Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, an Hsiao- Wuen Hon, Aapting Ranking SVM to Document Retrieval, Proc. of SIGIR 2006, 86-93. Yu-Ting Liu, Tie-Yan Liu, Tao Qin, Zhi-Ming Ma, Hang Li, Supervise Rank Aggregation, Proc. of WWW-2007, to appear, 2007.
THANK YOU