Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Size: px
Start display at page:

Download "Learning to Rank. from heuristics to theoretic approaches. Hongning Wang"

Transcription

1 Learning to Rank from heuristics to theoretic approaches Hongning Wang

2 Congratulations Job Offer from Bing Core Ranking team Design the ranking module for Bing.com CS 6501: Information Retrieval 2

3 How should I rank documents? Answer: Rank by relevance! CS 6501: Information Retrieval 3

4 Relevance?! CS 6501: Information Retrieval 4

5 The Notion of Relevance (Rep(q), Rep(d)) Similarity Relevance P(r=1 q,d) r {0,1} Probability of Relevance Relevance constraints [Fang et al. 04] Div. from Randomness (Amati & Rijsbergen 02) P(d q) or P(q d) Probabilistic inference Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) Regression Model (Fuhr 89) Learn. To Rank (Joachims 02, Berges et al. 05) Doc generation Generative Model Classical prob. Model (Robertson & Sparck Jones, 76) Query generation LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Different inference system Prob. concept space model (Wong & Yao, 95) Inference network model (Turtle & Croft, 91) CS 6501: Information Retrieval 5

6 Relevance Estimation Query matching Language model BM25 Vector space cosine similarity Document importance PageRank HITS CS 6501: Information Retrieval 6

7 Did I do a good job of ranking PageRank BM25 documents? CS 6501: Information Retrieval 7

8 Did I do a good job of ranking documents? IR evaluations metrics Precision@K MAP NDCG CS 6501: Information Retrieval 8

9 Take advantage of different relevance estimator? Ensemble the cues Linear? aa 1 BBBBBB + αα 2 LLLL + αα 3 PPPPPPPPPPPPPPPP + αα 4 HHHHHHHH Non-linear? {αα 1 = 0.4, αα 2 = 0.2, αα 3 = 0.1, αα 4 = 0.1} {MMMMMM = 0.20, NNNNNNNN = 0.6} {αα 1 = 0.2, αα 2 = 0.2, αα 3 = 0.3, αα 4 = 0.1} {MMMMMM = 0.12, NNNNNNNN = 0.5 } Decision tree-like {αα 1 = 0.1, αα 2 = 0.1, αα 3 = 0.5, αα 4 = 0.2} {MMMMMM = 0.18, NNNNNNNN = 0.7} BM25>0.5 True False True LM > 0.1 False True PageRank > 0.3 False r= 1.0 r= 0.7 r= 0.4 r= 0.1 CS 6501: Information Retrieval 9

10 What if we have thousands of features? Is there any way I can do better? Optimizing the metrics automatically! Where to find those tree structures? How to determine those ααs? CS 6501: Information Retrieval 10

11 Rethink the task Given: (query, document) pairs represented by a set of relevance estimators, a.k.a., features DocID BM25 LM PageRank Label Needed: a way of combining the estimators DD ff qq, dd ii=1 DD ordered dd ii=1 Criterion: optimize IR metrics P@k, MAP, NDCG, etc. Key! CS 6501: Information Retrieval 11

12 Machine Learning Input: { XX 1, YY 1, XX 2, YY 2,, (XX nn, YY nn )}, where XX ii RR NN, YY ii RR MM Objective function : OO(YY, YY) Output: ff XX YY, such that ff = argmax ff FF OO(ff (XX), YY) NOTE: We will only talk about supervised learning. Classification OO YY, YY = δδ(yy = YY) istical_classification Regression OO YY, YY = YY YY ion_analysis CS 6501: Information Retrieval 12

13 Learning to Rank General solution in optimization framework Input: (qq ii, dd 1 ), yy 1, (qq ii, dd 2 ), yy 2,, (qq ii, dd nn ), yy nn, where d n RR NN, yy ii {0,, LL} Objective: O = {P@k, MAP, NDCG} Output: ff qq, dd YY, s.t., ff = argmax ff FF OO(ff (qq, dd), YY) DocID BM25 LM PageRank Label CS 6501: Information Retrieval 13

14 Challenge: how to optimize? Evaluation metric recap Average Precision DCG Order is essential! ff order metric Not continuous with respect to f(x)! CS 6501: Information Retrieval 14

15 Approximating the objective function! Pointwise Fit the relevance labels individually Pairwise Fit the relative orders Listwise Fit the whole order CS 6501: Information Retrieval 15

16 Pointwise Learning to Rank Ideally perfect relevance prediction leads to perfect ranking ff score order metric Reducing ranking problem to Regression OO ff QQ, DD, YY = ii ff qq ii, dd ii yy ii Subset Ranking using Regression, D.Cossock and T.Zhang, COLT 2006 (multi-)classification OO ff QQ, DD, YY = ii δδ(ff qq ii, dd ii = yy ii ) Ranking with Large Margin Principles, A. Shashua and A. Levin, NIPS 2002 CS 6501: Information Retrieval 16

17 Subset Ranking using Regression D.Cossock and T.Zhang, COLT 2006 Fit relevance labels via regression Emphasize more on relevant documents Weights on each document Most positive document CS 6501: Information Retrieval 17

18 Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002 Goal: correctly placing the documents in the corresponding category and maximize the margin Y=1 Y=0 Y=2 Reduce the violations αααα CS 6501: Information Retrieval 18

19 Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002 Ranking lost is consistently decreasing with more training data CS 6501: Information Retrieval 19

20 What did we learn Machine learning helps! Derive something optimizable More efficient and guided CS 6501: Information Retrieval 20

21 Deficiency Cannot directly optimize IR metrics (0 1, 2 0) worse than (0-2, 2 4) Position of documents are ignored Penalty on documents 1 at higher positions should dd 0 1 be larger -2 Favor the queries with more documents dd 1-1 dd 1 dd 2 2 CS 6501: Information Retrieval 21

22 Pairwise Learning to Rank Ideally perfect partial order leads to perfect ranking ff partial order order metric Ordinal regression OO ff QQ, DD, YY = ii jj δδ(yy ii > yy jj )δδ(ff qq ii, dd ii < ff(qq ii, dd ii )) Relative ordering between different documents is significant E.g., (0-2, 2 4) is better than (0 1, 2 0) Large body of research dd 1 1 dd 2 2 dd 1 CS 6501: Information Retrieval dd 1 22

23 Optimizing Search Engines using Clickthrough Data Thorsten Joachims, KDD 02 Minimizing the number of mis-ordered pairs yy 1 > yy 2, yy 2 > yy 3, yy 1 > yy linear combination of features yy 1 > yy 2 ff qq, dd = ww TT XX qq,dd Keep the relative orders RankSVM CS 6501: Information Retrieval 23

24 Optimizing Search Engines using Clickthrough Data Thorsten Joachims, KDD 02 How to use it? ff scccccccc order yy 1 > yy 2 > yy 3 > yy 4 CS 6501: Information Retrieval 24

25 Optimizing Search Engines using Clickthrough Data Thorsten Joachims, KDD 02 What did it learn from the data? Linear correlations Positive correlated features Negative correlated features CS 6501: Information Retrieval 25

26 Optimizing Search Engines using Clickthrough Data Thorsten Joachims, KDD 02 How good is it? Test on real system CS 6501: Information Retrieval 26

27 An Efficient Boosting Algorithm for Combining Preferences Y. Freund, R. Iyer, et al. JMLR 2003 Smooth the loss on mis-ordered pairs yyii >yy jj PPPP dd ii, dd jj eeeeee[ff qq, dd jj ff qq, dd ii ] square loss hinge loss (RankSVM) 0/1 loss exponential loss CS 6501: Information Retrieval from Pattern Recognition 27 and Machine Learning, P337

28 An Efficient Boosting Algorithm for Combining Preferences Y. Freund, R. Iyer, et al. JMLR 2003 RankBoost: optimize via boosting Vote by a committee BM25 PageRank Cosine Updating Pr(dd ii, dd jj ) Credibility of each committee member (ranking feature) from Pattern Recognition and Machine Learning, P658 CS 6501: Information Retrieval 28

29 An Efficient Boosting Algorithm for Combining Preferences Y. Freund, R. Iyer, et al. JMLR 2003 How good is it? CS 6501: Information Retrieval 29

30 A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments Zheng et al. SIRIG 07 Non-linear ensemble of features Object: yyii >yy jj max 0, ff qq, dd jj ff qq, dd ii 2 Gradient descent boosting tree Boosting tree Using regression tree to minimize the residuals rr tt (qq, dd, yy) = OO tt (qq, dd, yy) ff tt 1 (qq, dd, yy) BM25>0.5 True False PageRank LM > 0.1 > 0.3 True False True False r= 1.0 r= 0.7 CS 6501: Information Retrieval 30 r= 0.4 r= 0.1

31 A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments Zheng et al. SIRIG 07 Non-linear v.s. linear Comparing with RankSVM CS 6501: Information Retrieval 31

32 Where do we get the relative orders Human annotations Small scale, expensive to acquire Clickthroughs Large amount, easy to acquire CS 6501: Information Retrieval 32

33 What did we learn Predicting relative order Getting closer to the nature of ranking Promising performance in practice Pairwise preferences from click-throughs CS 6501: Information Retrieval 33

34 Listwise Learning to Rank Can we directly optimize the ranking? ff order metric Tackle the challenge Optimization without gradient CS 6501: Information Retrieval 34

35 From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges, 2010 Minimizing mis-ordered pair => maximizing IR metrics? Mis-ordered pairs: 6 Mis-ordered pairs: 4 AP: 5 8 AP: 5 12 DCG: DCG: Position is crucial! CS 6501: Information Retrieval 35

36 From RankNet to LambdaRank to Weight the mis-ordered pairs? LambdaMART: An Overview Christopher J.C. Burges, 2010 Some pairs are more important to be placed in the right order Inject into object function yyii >yy jj Ω dd ii, dd jj Inject into gradient λλ iiii = OO aaaaaaaaaa ΔOO Depend on the ranking of document i, j in the whole list exp wwδφ qq nn, dd ii, dd jj Change in original object, e.g., NDCG, if we switch the documents i and j, leaving the other documents unchanged Gradient with respect to approximated objective, i.e., exponential loss on mis-ordered pairs CS 6501: Information Retrieval 36

37 Lambda functions Gradient? Yes, it meets the sufficient and necessary condition of being partial derivative Lead to optimal solution of original problem? Empirically From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges, 2010 CS 6501: Information Retrieval 37

38 Evolution From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges, 2010 RankNet LambdaRank LambdaMART Objective Cross entropy over the pairs Unknown Unknown Gradient (λλ function) Gradient of cross entropy Gradient of cross entropy times pairwise change in target metric Gradient of cross entropy times pairwise change in target metric Optimization method neural network stochastic gradient descent Multiple Additive Regression Trees (MART) As we discussed in RankBoost Optimize solely by gradient Non-linear combination CS 6501: Information Retrieval 38

39 A Lambda tree From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges, 2010 splitting Combination of features CS 6501: Information Retrieval 39

40 A Support Vector Machine for Optimizing Average Precision Yisong Yue, et al., SIGIR 07 RankSVM Minimizing the pairwise loss SVM-MAP Minimizing the structural loss MAP difference Loss defined on the number of mis-ordered document pairs Loss defined on the quality of the whole list of ordered documents CS 6501: Information Retrieval 40

41 A Support Vector Machine for Optimizing Max margin principle Average Precision Yisong Yue, et al., SIGIR 07 Push the ground-truth far away from any mistakes one might make Finding the most likely violated constraints CS 6501: Information Retrieval 41

42 A Support Vector Machine for Optimizing Average Precision Yisong Yue, et al., SIGIR 07 Finding the most violated constraints MAP is invariant to permutation of (ir)relevant documents Maximize MAP over a series of swaps between relevant and irrelevant documents Right-hand side of constraints Start from the reverse order of ideal ranking Greedy solution CS 6501: Information Retrieval 42

43 A Support Vector Machine for Optimizing Experiment results Average Precision Yisong Yue, et al., SIGIR 07 CS 6501: Information Retrieval 43

44 Other listwise solutions Soften the metrics to make them differentiable Michael Taylor et al., SoftRank: optimizing nonsmooth rank metrics, WSDM'08 Minimize a loss function defined on permutations Zhe Cao et al., Learning to rank: from pairwise approach to listwise approach, ICML'07 CS 6501: Information Retrieval 44

45 What did we learn Taking a list of documents as a whole Positions are visible for the learning algorithm Directly optimizing the target metric Limitation The search space is huge! CS 6501: Information Retrieval 45

46 Summary Learning to rank Automatic combination of ranking features for optimizing IR evaluation metrics Approaches Pointwise Fit the relevance labels individually Pairwise Fit the relative orders Listwise Fit the whole order CS 6501: Information Retrieval 46

47 Experimental Comparisons My experiments 1.2k queries, 45.5K documents with 1890 features 800 queries for training, 400 queries for testing MAP ERR MRR ListNET LambdaMART RankNET RankBoost RankSVM AdaRank plogistic Logistic CS 6501: Information Retrieval 47

48 Connection with Traditional IR People have foreseen this topic long time ago Recall: probabilistic ranking principle CS 6501: Information Retrieval 48

49 Conditional models for P(R=1 Q,D) Basic idea: relevance depends on how well a query matches a document P(R=1 Q,D)=g(Rep(Q,D) θ) Rep(Q,D): feature representation of query-doc pair E.g., #matched terms, highest IDF of a matched term, doclen Using training data (with known relevance judgments) to estimate parameter θ Apply the model to rank new documents Special case: logistic regression a functional form CS 6501: Information Retrieval 49

50 Broader Notion of Relevance Social network Language Model Query Cosine BM25 Query relation Language Model Query Cosine BM25 Query relation Language Model Query Cosine BM25 Documents Documents Documents Click/View Linkage structure Visual Structure Linkage structure likes CS 6501: Information Retrieval 50

51 Future Tighter bounds Faster solution Larger scale Wider application scenario CS 6501: Information Retrieval 51

52 Resources Books Liu, Tie-Yan. Learning to rank for information retrieval. Vol. 13. Springer, Li, Hang. "Learning to rank for information retrieval and natural language processing." Synthesis Lectures on Human Language Technologies 4.1 (2011): Helpful pages Packages RankingSVM: RankLib: Data sets LETOR Yahoo! Learning to rank challenge CS 6501: Information Retrieval 52

53 References Liu, Tie-Yan. "Learning to rank for information retrieval." Foundations and Trends in Information Retrieval 3.3 (2009): Cossock, David, and Tong Zhang. "Subset ranking using regression." Learning theory (2006): Shashua, Amnon, and Anat Levin. "Ranking with large margin principle: Two approaches." Advances in neural information processing systems 15 (2003): Joachims, Thorsten. "Optimizing search engines using clickthrough data." Proceedings of the eighth ACM SIGKDD. ACM, Freund, Yoav, et al. "An efficient boosting algorithm for combining preferences." The Journal of Machine Learning Research 4 (2003): Zheng, Zhaohui, et al. "A regression framework for learning ranking functions using relative relevance judgments." Proceedings of the 30th annual international ACM SIGIR. ACM, CS 6501: Information Retrieval 53

54 References Joachims, Thorsten, et al. "Accurately interpreting clickthrough data as implicit feedback." Proceedings of the 28th annual international ACM SIGIR. ACM, Burges, C. "From ranknet to lambdarank to lambdamart: An overview." Learning 11 (2010): Xu, Jun, and Hang Li. "AdaRank: a boosting algorithm for information retrieval." Proceedings of the 30th annual international ACM SIGIR. ACM, Yue, Yisong, et al. "A support vector method for optimizing average precision." Proceedings of the 30th annual international ACM SIGIR. ACM, Taylor, Michael, et al. "Softrank: optimizing non-smooth rank metrics." Proceedings of the international conference WSDM. ACM, Cao, Zhe, et al. "Learning to rank: from pairwise approach to listwise approach." Proceedings of the 24th ICML. ACM, CS 6501: Information Retrieval 54

55 Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002 Maximizing the sum of margins Y=0 Y=1 Y=2 αααα CS 6501: Information Retrieval 55

56 AdaRank: a boosting algorithm for Loss defined by IR metrics qq QQ PPPP qq eeeeee[ OO(qq)] Optimizing by boosting information retrieval Jun Xu & Hang Li, SIGIR 07 BM25 PageRank Cosine Target metrics: MAP, NDCG, MRR Updating Pr(qq) Credibility of each committee member (ranking feature) from Pattern Recognition and Machine Learning, P658 CS 6501: Information Retrieval 56

57 Analysis of the Approaches What are they really optimizing? Relation with IR metrics g a p CS 6501: Information Retrieval 57

58 Pointwise Approaches Regression based Discount coefficients in DCG Classification based Regression loss Discount coefficients in DCG Classification loss CS 6501: Information Retrieval 58

59 From CS 6501: Information Retrieval 59

60 Discount coefficients in DCG From CS 6501: Information Retrieval 60

61 Listwise Approaches No general analysis Method dependent Directness and consistency CS 6501: Information Retrieval 61

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

Fall Lecture 16: Learning-to-rank

Fall Lecture 16: Learning-to-rank Fall 2016 CS646: Information Retrieval Lecture 16: Learning-to-rank Jiepu Jiang University of Massachusetts Amherst 2016/11/2 Credit: some materials are from Christopher D. Manning, James Allan, and Honglin

More information

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia Learning to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/20/2008 Tie-Yan Liu @ Tutorial at WWW 2008 1 The Speaker Tie-Yan Liu Lead Researcher, Microsoft Research

More information

Information Retrieval

Information Retrieval Information Retrieval Learning to Rank Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing Data

More information

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan, Learning to Rank Tie-Yan Liu Microsoft Research Asia CCIR 2011, Jinan, 2011.10 History of Web Search Search engines powered by link analysis Traditional text retrieval engines 2011/10/22 Tie-Yan Liu @

More information

Boolean Model. Hongning Wang

Boolean Model. Hongning Wang Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer

More information

A Few Things to Know about Machine Learning for Web Search

A Few Things to Know about Machine Learning for Web Search AIRS 2012 Tianjin, China Dec. 19, 2012 A Few Things to Know about Machine Learning for Web Search Hang Li Noah s Ark Lab Huawei Technologies Talk Outline My projects at MSRA Some conclusions from our research

More information

Search Engines and Learning to Rank

Search Engines and Learning to Rank Search Engines and Learning to Rank Joseph (Yossi) Keshet Query processor Ranker Cache Forward index Inverted index Link analyzer Indexer Parser Web graph Crawler Representations TF-IDF To get an effective

More information

One-Pass Ranking Models for Low-Latency Product Recommendations

One-Pass Ranking Models for Low-Latency Product Recommendations One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team,

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

Structured Ranking Learning using Cumulative Distribution Networks

Structured Ranking Learning using Cumulative Distribution Networks Structured Ranking Learning using Cumulative Distribution Networks Jim C. Huang Probabilistic and Statistical Inference Group University of Toronto Toronto, ON, Canada M5S 3G4 jim@psi.toronto.edu Brendan

More information

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Learning to Rank for Faceted Search Bridging the gap between theory and practice Learning to Rank for Faceted Search Bridging the gap between theory and practice Agnes van Belle @ Berlin Buzzwords 2017 Job-to-person search system Generated query Match indicator Faceted search Multiple

More information

Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER

Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER Chalmers University of Technology University of Gothenburg

More information

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction in in Florida State University November 17, 2017 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank

Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi etrieval Effectiveness and Learning to ank EMIE YILMAZ Professor and Turing Fellow University College London esearch

More information

A General Approximation Framework for Direct Optimization of Information Retrieval Measures

A General Approximation Framework for Direct Optimization of Information Retrieval Measures A General Approximation Framework for Direct Optimization of Information Retrieval Measures Tao Qin, Tie-Yan Liu, Hang Li October, 2008 Abstract Recently direct optimization of information retrieval (IR)

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract)

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract) Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract) Christoph Sawade sawade@cs.uni-potsdam.de

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Document indexing, similarities and retrieval in large scale text collections

Document indexing, similarities and retrieval in large scale text collections Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1

More information

Learning to Rank for Information Retrieval

Learning to Rank for Information Retrieval Learning to Rank for Information Retrieval Tie-Yan Liu Learning to Rank for Information Retrieval Tie-Yan Liu Microsoft Research Asia Bldg #2, No. 5, Dan Ling Street Haidian District Beijing 100080 People

More information

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University A Machine Learning Approach for Information Retrieval Applications Luo Si Department of Computer Science Purdue University Why Information Retrieval: Information Overload: Since the introduction of digital

More information

arxiv: v1 [cs.ir] 19 Sep 2016

arxiv: v1 [cs.ir] 19 Sep 2016 Enhancing LambdaMART Using Oblivious Trees Marek Modrý 1 and Michal Ferov 2 arxiv:1609.05610v1 [cs.ir] 19 Sep 2016 1 Seznam.cz, Radlická 3294/10, 150 00 Praha 5, Czech Republic marek.modry@firma.seznam.cz

More information

UMass at TREC 2017 Common Core Track

UMass at TREC 2017 Common Core Track UMass at TREC 2017 Common Core Track Qingyao Ai, Hamed Zamani, Stephen Harding, Shahrzad Naseri, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval College of Information and Computer

More information

Learning to Rank: A New Technology for Text Processing

Learning to Rank: A New Technology for Text Processing TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

Consistency Theory in Machine Learning

Consistency Theory in Machine Learning Consistency Theory in Machine Learning Wei Gao ( 高尉 ) Learning And Mining from DatA (LAMDA) National Key Laboratory for Novel Software Technology Nanjing University Machine learning Training data learn

More information

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12 Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency

More information

A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising

A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising ABSTRACT Maryam Karimzadehgan Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL

More information

Learning to Rank Only Using Training Data from Related Domain

Learning to Rank Only Using Training Data from Related Domain Learning to Rank Only Using Training Data from Related Domain Wei Gao, Peng Cai 2, Kam-Fai Wong, and Aoying Zhou 2 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China {wgao, kfwong}@se.cuhk.edu.hk

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

High Accuracy Retrieval with Multiple Nested Ranker

High Accuracy Retrieval with Multiple Nested Ranker High Accuracy Retrieval with Multiple Nested Ranker Irina Matveeva University of Chicago 5801 S. Ellis Ave Chicago, IL 60637 matveeva@uchicago.edu Chris Burges Microsoft Research One Microsoft Way Redmond,

More information

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary Risk Minimization and Language Modeling in Text Retrieval Thesis Summary ChengXiang Zhai Language Technologies Institute School of Computer Science Carnegie Mellon University July 21, 2002 Abstract This

More information

Improving Recommendations Through. Re-Ranking Of Results

Improving Recommendations Through. Re-Ranking Of Results Improving Recommendations Through Re-Ranking Of Results S.Ashwini M.Tech, Computer Science Engineering, MLRIT, Hyderabad, Andhra Pradesh, India Abstract World Wide Web has become a good source for any

More information

arxiv: v2 [cs.ir] 4 Dec 2018

arxiv: v2 [cs.ir] 4 Dec 2018 Qingyao Ai CICS, UMass Amherst Amherst, MA, USA aiqy@cs.umass.edu Xuanhui Wang xuanhui@google.com Nadav Golbandi nadavg@google.com arxiv:1811.04415v2 [cs.ir] 4 Dec 2018 ABSTRACT Michael Bendersky bemike@google.com

More information

Adapting Ranking Functions to User Preference

Adapting Ranking Functions to User Preference Adapting Ranking Functions to User Preference Keke Chen, Ya Zhang, Zhaohui Zheng, Hongyuan Zha, Gordon Sun Yahoo! {kchen,yazhang,zhaohui,zha,gzsun}@yahoo-inc.edu Abstract Learning to rank has become a

More information

A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank

A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank MUHAMMAD IBRAHIM, Monash University, Australia MARK CARMAN, Monash University, Australia Current random forest

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Linking Entities in Tweets to Wikipedia Knowledge Base

Linking Entities in Tweets to Wikipedia Knowledge Base Linking Entities in Tweets to Wikipedia Knowledge Base Xianqi Zou, Chengjie Sun, Yaming Sun, Bingquan Liu, and Lei Lin School of Computer Science and Technology Harbin Institute of Technology, China {xqzou,cjsun,ymsun,liubq,linl}@insun.hit.edu.cn

More information

Lecture 3: Improving Ranking with

Lecture 3: Improving Ranking with Modeling User Behavior and Interactions Lecture 3: Improving Ranking with Behavior Data 1 * &!" ) & $ & 6+ $ & 6+ + "& & 8 > " + & 2 Lecture 3 Plan $ ( ' # $ #! & $ #% #"! #! -( #", # + 4 / 0 3 21 0 /

More information

Learning-to-rank with Prior Knowledge as Global Constraints

Learning-to-rank with Prior Knowledge as Global Constraints Learning-to-rank with Prior Knowledge as Global Constraints Tiziano Papini and Michelangelo Diligenti 1 Abstract. A good ranking function is the core of any Information Retrieval system. The ranking function

More information

arxiv: v2 [cs.ir] 27 Feb 2019

arxiv: v2 [cs.ir] 27 Feb 2019 Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm arxiv:1809.05818v2 [cs.ir] 27 Feb 2019 ABSTRACT Ziniu Hu University of California, Los Angeles, USA bull@cs.ucla.edu Recently a number

More information

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016 CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any

More information

Text Categorization (I)

Text Categorization (I) CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization

More information

Learning Dense Models of Query Similarity from User Click Logs

Learning Dense Models of Query Similarity from User Click Logs Learning Dense Models of Query Similarity from User Click Logs Fabio De Bona, Stefan Riezler*, Keith Hall, Massi Ciaramita, Amac Herdagdelen, Maria Holmqvist Google Research, Zürich *Dept. of Computational

More information

Ranking with Query-Dependent Loss for Web Search

Ranking with Query-Dependent Loss for Web Search Ranking with Query-Dependent Loss for Web Search Jiang Bian 1, Tie-Yan Liu 2, Tao Qin 2, Hongyuan Zha 1 Georgia Institute of Technology 1 Microsoft Research Asia 2 Outline Motivation Incorporating Query

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

arxiv: v1 [cs.ir] 16 Sep 2018

arxiv: v1 [cs.ir] 16 Sep 2018 A Novel Algorithm for Unbiased Learning to Rank arxiv:1809.05818v1 [cs.ir] 16 Sep 2018 ABSTRACT Ziniu Hu University of California, Los Angeles acbull@g.ucla.edu Qu Peng ByteDance Technology pengqu@bytedance.com

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

Learning Socially Optimal Information Systems from Egoistic Users

Learning Socially Optimal Information Systems from Egoistic Users Learning Socially Optimal Information Systems from Egoistic Users Karthik Raman Thorsten Joachims Department of Computer Science Cornell University, Ithaca NY karthik@cs.cornell.edu www.cs.cornell.edu/

More information

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1 LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1 1 Microsoft Research Asia, No.49 Zhichun Road, Haidian

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

arxiv: v1 [stat.ap] 14 Mar 2018

arxiv: v1 [stat.ap] 14 Mar 2018 arxiv:1803.05127v1 [stat.ap] 14 Mar 2018 Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets Sen LEI, Xinzhi HAN Submitted for the PSTAT 231 (Fall 2017) Final Project ONLY University

More information

Adaptive Search Engines Learning Ranking Functions with SVMs

Adaptive Search Engines Learning Ranking Functions with SVMs Adaptive Search Engines Learning Ranking Functions with SVMs CS478/578 Machine Learning Fall 24 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings

More information

Learning Temporal-Dependent Ranking Models

Learning Temporal-Dependent Ranking Models Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Part 5: Structured Support Vector Machines

Part 5: Structured Support Vector Machines Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson Microsoft Research Search = Modeling User Behavior

More information

Session Based Click Features for Recency Ranking

Session Based Click Features for Recency Ranking Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Session Based Click Features for Recency Ranking Yoshiyuki Inagaki and Narayanan Sadagopan and Georges Dupret and Ciya

More information

Learning Ranking Functions with Implicit Feedback

Learning Ranking Functions with Implicit Feedback Learning Ranking Functions with Implicit Feedback CS4780 Machine Learning Fall 2011 Pannaga Shivaswamy Cornell University These slides are built on an earlier set of slides by Prof. Joachims. Current Search

More information

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma Department of Computer Science and Technology, Tsinghua University Background

More information

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond Traditional

More information

Concept of Curve Fitting Difference with Interpolation

Concept of Curve Fitting Difference with Interpolation Curve Fitting Content Concept of Curve Fitting Difference with Interpolation Estimation of Linear Parameters by Least Squares Curve Fitting by Polynomial Least Squares Estimation of Non-linear Parameters

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

arxiv: v1 [cs.ir] 16 Oct 2017

arxiv: v1 [cs.ir] 16 Oct 2017 DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, Xueqi Cheng pl8787@gmail.com,{lanyanyan,guojiafeng,junxu,cxq}@ict.ac.cn,xujingfang@sogou-inc.com

More information

Large-Scale Validation and Analysis of Interleaved Search Evaluation

Large-Scale Validation and Analysis of Interleaved Search Evaluation Large-Scale Validation and Analysis of Interleaved Search Evaluation Olivier Chapelle, Thorsten Joachims Filip Radlinski, Yisong Yue Department of Computer Science Cornell University Decide between two

More information

Predicting Query Performance on the Web

Predicting Query Performance on the Web Predicting Query Performance on the Web No Author Given Abstract. Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However,

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Learning to Rank (part 2)

Learning to Rank (part 2) Learning to Rank (part 2) NESCAI 2008 Tutorial Filip Radlinski Cornell University Recap of Part 1 1. Learning to rank is widely used for information retrieval, and by web search engines. 2. There are many

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Evaluation Rank-Based Measures Binary relevance Precision@K (P@K) Mean Average Precision (MAP) Mean Reciprocal Rank (MRR) Multiple levels of relevance Normalized Discounted

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

Clickthrough Log Analysis by Collaborative Ranking

Clickthrough Log Analysis by Collaborative Ranking Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Clickthrough Log Analysis by Collaborative Ranking Bin Cao 1, Dou Shen 2, Kuansan Wang 3, Qiang Yang 1 1 Hong Kong

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

Apache Solr Learning to Rank FTW!

Apache Solr Learning to Rank FTW! Apache Solr Learning to Rank FTW! Berlin Buzzwords 2017 June 12, 2017 Diego Ceccarelli Software Engineer, News Search dceccarelli4@bloomberg.net Michael Nilsson Software Engineer, Unified Search mnilsson23@bloomberg.net

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

A Machine Learning Approach for Improved BM25 Retrieval

A Machine Learning Approach for Improved BM25 Retrieval A Machine Learning Approach for Improved BM25 Retrieval Krysta M. Svore and Christopher J. C. Burges Microsoft Research One Microsoft Way Redmond, WA 98052 {ksvore,cburges}@microsoft.com Microsoft Research

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Support Vector Machines 290N, 2015

Support Vector Machines 290N, 2015 Support Vector Machines 290N, 2015 Two Class Problem: Linear Separable Case with a Hyperplane Class 1 Class 2 Many decision boundaries can separate these two classes using a hyperplane. Which one should

More information

Scaling Up Decision Tree Ensembles

Scaling Up Decision Tree Ensembles Scaling Up Decision Tree Ensembles Misha Bilenko (Microsoft) with Ron Bekkerman (LinkedIn) and John Langford (Yahoo!) http://hunch.net/~large_scale_survey Tree Ensembles: Basics Rule-based prediction is

More information

A probabilistic description-oriented approach for categorising Web documents

A probabilistic description-oriented approach for categorising Web documents A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

RMIT University at TREC 2006: Terabyte Track

RMIT University at TREC 2006: Terabyte Track RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction

More information

Northeastern University in TREC 2009 Million Query Track

Northeastern University in TREC 2009 Million Query Track Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College

More information

Fast Learning of Document Ranking Functions with the Committee Perceptron

Fast Learning of Document Ranking Functions with the Committee Perceptron Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science -008 Fast Learning of Document Ranking Functions with the Committee Perceptron Jonathan L. Elsas

More information

A new click model for relevance prediction in web search

A new click model for relevance prediction in web search A new click model for relevance prediction in web search Alexander Fishkov 1 and Sergey Nikolenko 2,3 1 St. Petersburg State Polytechnical University jetsnguns@gmail.com 2 Steklov Mathematical Institute,

More information

Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs

Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs Filip Radlinski and Thorsten Joachims Department of Computer Science Cornell University, Ithaca, NY {filip,tj}@cs.cornell.edu

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Link Analysis. Hongning Wang

Link Analysis. Hongning Wang Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query

More information

IE598 Big Data Optimization Summary Nonconvex Optimization

IE598 Big Data Optimization Summary Nonconvex Optimization IE598 Big Data Optimization Summary Nonconvex Optimization Instructor: Niao He April 16, 2018 1 This Course Big Data Optimization Explore modern optimization theories, algorithms, and big data applications

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

Lecture 9. Support Vector Machines

Lecture 9. Support Vector Machines Lecture 9. Support Vector Machines COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Support vector machines (SVMs) as maximum

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83

More information

Learning with Humans in the Loop

Learning with Humans in the Loop Learning with Humans in the Loop CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University Optional Reading: - Yisong Yue, J. Broder, R. Kleinberg, T. Joachims, The K-armed Dueling Bandits

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information