Information Retrieval

Size: px
Start display at page:

Download "Information Retrieval"

Transcription

1 Information Retrieval Learning to Rank Ilya Markov University of Amsterdam Ilya Markov Information Retrieval 1

2 Course overview Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 2

3 This lecture Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 3

4 Outline 1 Current trends in IR 2 Ilya Markov i.markov@uva.nl Information Retrieval 4

5 IR conferences ACM Conference on Research and Development in Information Retrieval (SIGIR) ACM Conference on Information Knowledge and Management (CIKM) ACM Conference on Web Search and Data Mining (WSDM) European Conference on Information Retrieval (ECIR) Ilya Markov Information Retrieval 5

6 IR journals ACM Transactions o Information Systems (TOIS) Information Retrieval Journal (IRJ) Information Processing and Management (IPM) Ilya Markov i.markov@uva.nl Information Retrieval 6

7 Surveys Foundations and Trends in Information Retrieval (FnTIR) Synthesis Lectures on Information Concepts, Retrieval, and Services by Morgan&Claypool Publishers Ilya Markov Information Retrieval 7

8 SIGIR 2016 Evaluation Efficiency Retrieval models, learning-to-rank, web search Users, user needs, search behavior Novelty and diversity Speech, conversation systems, question answering Recommendation systems Entities and knowledge graphs Ilya Markov Information Retrieval 8

9 WSDM 2016 Communities, social interaction, social networks Search and semantics Observing users, leveraging users Big data algorithms Entities and structure Ilya Markov Information Retrieval 9

10 Ranking methods 1 Content-based Term-based Semantic 2 Link-based (web search) 3 Ilya Markov i.markov@uva.nl Information Retrieval 10

11 Outline 1 Current trends in IR 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 11

12 Machine learning Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. Input {x i } n i=1 Output {y i } n i=1 Learn a model h(x) that optimizes a loss function L(h(x), y) For a new instance x new predict the output y = h(x new ) Ilya Markov i.markov@uva.nl Information Retrieval 12

13 the training data. This is also highly demanding for real search engines, because everyday these search engines will receive a lot of user feedback and usage logs indicating poor ranking for some queries or documents. It is very important to automatically learn from feedback and constantly improve the ranking mechanism. Due to the aforementioned two characteristics, learning to rank has been widely used in commercial search engines, 13 and has also attracted The aimgreat of LTR attention isfrom to the come academic up research with optimal community. ordering of items, Figure 1.1 shows the typical learning-to-rank flow. From the figure where the relative ordering among the items is more important we can see that since learning to rank is a kind of supervised learning, a training thanset the is needed. exact Thescore creationthat of a training eachset item is verygets. similar to T.-Y. Liu, Learning to Rank for Information Retrieval Fig. 1.1 Learning-to-rank framework. Ilya Markov i.markov@uva.nl Information Retrieval 13

14 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 14

15 Machine learning Input {x i } n i=1 Output {y i } n i=1 Learn a model h(x) that optimizes a loss function L(h(x), y) Examples Linear model h(x i ) = w T x i = lk=1 w kx ik Quadratic loss function L(h(x i ), y i ) = h(x i ) y i 2 How to learn the model h(x), i.e., how to estimate its parameters? Ilya Markov i.markov@uva.nl Information Retrieval 15

16 Learning the model h(x) If there is a closed form solution for the parameters of h(x) 1 Compute the derivative of the loss function L with respect to some parameter w k 2 L Equate this derivative to zero: w k = 0 3 Find the optimal value of the parameter w k If there is no closed form solution, use gradient descent 1 Compute or approximate the gradient of the loss function [ L = parameters L w 1,..., L w l ] using the current values of the 2 Update the model parameters by taking a small step in the opposite direction of the gradient: w w η L Ilya Markov i.markov@uva.nl Information Retrieval 16

17 Gradient descent Picture taken from Ilya Markov Information Retrieval 17

18 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 18

19 been widely used in commercial search engines, 13 and has also attracted great attention from the academic research community. Figure 1.1 shows the typical learning-to-rank flow. From the figure we can see that since learning to rank is a kind of supervised learning, a training set is needed. The creation of a training set is very similar to Fig. 1.1 Learning-to-rank T.-Y. Liu, framework. Learning to Rank for Information Retrieval 13 See Ilya Markov i.markov@uva.nl Information Retrieval 19

20 Query-document representation Each query-document pair (q (n), x i ) is represented as a vector of features x (n) i Features Content-based Link-based User-based = [x (n) i1, x (n) i2,..., x (n) il ] Ilya Markov i.markov@uva.nl Information Retrieval 20

21 document pair, as shown in Table 6.2. For the OHSUMED corpus, 40 features were extracted in total, as shown in Table 6.3. Content-based features Table 6.2 Learning features of TREC. ID Feature description 1 Term frequency (TF) of body 2 TF of anchor 3 TF of title 4 TF of URL 5 TF of whole document 6 Inverse document frequency (IDF) of body 7 IDF of anchor 8 IDF of title 9 IDF of URL 10 IDF of whole document 11 TF*IDF of body 12 TF*IDF of anchor 13 TF*IDF of title 14 TF*IDF of URL 15 TF*IDF of whole document 16 Document length (DL) of body 17 DL of anchor 18 DL of title 19 DL of URL 20 DL of whole document 21 BM25 of body 22 BM25 of anchor 23 BM25 of title 24 BM25 of URL 25 BM25 of whole document T.-Y. 26 LMIR.ABS Liu, Learning of body to Rank for Information Retrieval 27 LMIR.ABS of anchor 28 LMIR.ABS of title Ilya Markov 29 i.markov@uva.nl LMIR.ABS of URL Information Retrieval 21

22 Link-based features 6.1 The LETOR Collection 291 Table 6.2 (Continued) ID Feature description 40 LMIR.JM of whole document 41 Sitemap based term propagation 42 Sitemap based score propagation 43 Hyperlink base score propagation: weighted in-link 44 Hyperlink base score propagation: weighted out-link 45 Hyperlink base score propagation: uniform out-link 46 Hyperlink base feature propagation: weighted in-link 47 Hyperlink base feature propagation: weighted out-link 48 Hyperlink base feature propagation: uniform out-link 49 HITS authority 50 HITS hub 51 PageRank 52 HostRank 53 Topical PageRank 54 Topical HITS authority 55 Topical HITS hub 56 Inlink number 57 Outlink number 58 Number of slash in URL 59 Length of URL 60 Number of child page 61 BM25 of extracted title 62 LMIR.ABS of extracted title 63 T.-Y. LMIR.DIR Liu, Learning of extracted to Ranktitle for Information Retrieval 64 LMIR.JM of extracted title Table 6.3 Learning features of OHSUMED. Ilya Markov Information Retrieval 22

23 User-based features Type of interaction Clicks Time Queries Online metric Click-through rate for (q (n), x i ) Avg. click rank for (q (n), x i ) Avg. dwell time for (q (n), x i ) Avg. time to first click, when this click is on x i Avg. time to last click, when this click is on x i Number of reformulations before/after q (n) Number of times q (n) is abandoned Ilya Markov i.markov@uva.nl Information Retrieval 23

24 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 24

25 been widely used in commercial search engines, 13 and has also attracted great attention from the academic research community. Figure 1.1 shows the typical learning-to-rank flow. From the figure we can see that since learning to rank is a kind of supervised learning, a training set is needed. The creation of a training set is very similar to Fig. 1.1 Learning-to-rank T.-Y. Liu, framework. Learning to Rank for Information Retrieval 13 See Ilya Markov i.markov@uva.nl Information Retrieval 25

26 approaches LambdaMART LambdaRank T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 26

27 Pointwise LTR query h() Rel(red) Rel(gray) Rel(orange) h(blue) h(yellow) Rel(red) h(green) query h() Re Re Re h(green) pointwise LTR h(white) h( pointwise LTR Ilya Markov Information Retrieval 27

28 Pointwise LTR Reduces to traditional ML Input: query-document feature vectors x (n) i = [x (n) i1, x (n) i2,..., x (n) il ] Output: relevance labels y i Objective: learn a model h(x) that correctly predicts labels y Ilya Markov i.markov@uva.nl Information Retrieval 28

29 Regression Picture taken from Ilya Markov Information Retrieval 29

30 Classification Picture taken from Ilya Markov Information Retrieval 30

31 icit constraints on the thresholds to the optimization problem. Current trends in IR cit constraint simply takes the form of b k 1 b k, while the imp traint Ordinal uses regression redundant training examples to guarantee the ord ionship among thresholds..2 Sum of margin strategy. T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov i.markov@uva.nl Information Retrieval 31

32 Pointwise LTR Pros + Intuitive interpretation of relevance + Clear, how to get relevance judgements Cons Has a different optimization objective compared to IR (e.g., finding a correct class) Ilya Markov i.markov@uva.nl Information Retrieval 32

33 Pairwise LTR query query Pref(red>gray) h() Pref(gray>green) g() h() Pref(green>red) h(red>blue) pairwise LTR pairwis Ilya Markov Information Retrieval 33

34 RankNet Pointwise scoring function f (x i ) with parameters {w k } l k=1 Pairwise ground-truth P ij = I(x i > x j ) Probability of x i > x j is modeled using logistic regression P ij = P(x i > x j ) = Pairwise loss function (cross entropy) e σ(f i f j ) C = P ij log P ij (1 P ij ) log(1 P ij ) = (1 P ij )σ(f i f j ) + log(1 + e σ(f i f j ) ) RankNet optimizes the total number of pairwise errors Ilya Markov i.markov@uva.nl Information Retrieval 34

35 RankNet (cont d) Optimize the cost C C f i = σ [ ] 1 (1 P ij ) 1 + e σ(f = C i f j ) f j Update parameter w k of the function f (x i ) ( C f i w k w k η + C ) f j f i w k f j w k Ilya Markov i.markov@uva.nl Information Retrieval 35

36 Speeding up RankNet training Define λ ij as λ ij = C f i [ ] 1 = σ (1 P ij ) 1 + e σ(f i f j ) Let I denote the set of pairs of indices {i, j}, for which x i should be ranked differently from x j for a given query q I = {i, j x i > x j } Sum all contributions to update parameter w k δw k = η ( ) f i f j λ ij λ ij w k w k {i,j} I = η f i λ ij w k i j:{i,j} I j:{j,i} I f i λ ij w k = η i λ i f i w k Ilya Markov i.markov@uva.nl Information Retrieval 36

37 Interpreting λ s λ i = λ ij j:{i,j} I j:{j,i} I λ ij λ i is a sum of forces applied to document x i shown for query q All documents x j, that should be ranked below x i, push it up with the force λ ij All documents x j, that should be ranked above x i, push it down with the force λ ij Ilya Markov i.markov@uva.nl Information Retrieval 37

38 Pairwise LTR Pros + Easy to get preference judgements + Comes closer to optimizing the ranking Cons Still does not optimize the whole ranking Higher computational complexity compared to pointwise LTR Ilya Markov i.markov@uva.nl Information Retrieval 38

39 Listwise LTR R(blue) R(yellow) R(red) R(green) R(white) q1 NDCG(q1) R(gray) R(red) R(orange) R(white) R(yellow) q2 h() NDCG(q2) q3 R(orange) R(green) R(blue) R(gray) R(red) NDCG(q3) q4 listwise LTR Ilya Markov Information Retrieval 39

40 From RankNet From RankNet to to LambdaRank LambdaRank to LambdaMART: An Overview 7 The black arrows denote the RankNet gradients (which increase Fig. 1 A set of urls ordered for a given query using a binary relevance measure. The light gray with the bars represent number urls that of are not pairwise relevant to theerrors) query, while the dark blue bars represent urls that are relevant to the query. Left: the total number of pairwise errors is thirteen. Right: by moving top RankNet url down cost three rank decreases levels, and thefrom bottom relevant 13 url onupthe five, theleft total number to 11 of pairwise on the errorsright has been reduced to eleven. However for IR measures like NDCG and ERR that emphasize the top The actual few results, ranking this is not what gets we want. worse The (black) arrows on the left denote the RankNet gradients (which increase with the number of pairwise errors), whereas what we d really like are the (red) The red arrows arrows on the right. is what we would actually like to see C. Burges, From RankNet to LambdaRank to LambdaMART: An Overview 4 LambdaRank Ilya Markov i.markov@uva.nl Information Retrieval 40

41 LambdaRank λ ij in RankNet [ ] 1 λ ij = σ (1 P ij ) 1 + e σ(f i f j ) λ ij in LambdaRank λ ij = σ 1 + e σ(f i f j ) NDCG NDCG = NDCG(orig. ranking) NDCG(x i and x j are swapped) LambdaRank directly uses the ranking to compute gradients (i.e., λ ij s) instead of computing and optimizing a cost function Ilya Markov i.markov@uva.nl Information Retrieval 41

42 LambdaRank (cont d) Proceed similarly to RankNet: Sum all λ ij s for document x i and query q λ i = λ ij j:{i,j} I j:{j,i} I Update parameter w k of the function f (x i ) λ ij w k w k η i λ i f i w k Ilya Markov i.markov@uva.nl Information Retrieval 42

43 LambdaMART Multiple Additive Regression Trees (MART) MART does not need a cost function but gradients Adopts gradients (λ ij s) from LambdaRank Hence the name: Lambda + MART Ilya Markov i.markov@uva.nl Information Retrieval 43

44 Listwise LTR Pros + Directly optimizes the whole ranking Cons Needs many judgements High computational complexity Ilya Markov i.markov@uva.nl Information Retrieval 44

45 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 45

46 LEarning TO Rank datasets (LETOR) Query-document pairs precomputed feature vectors Relevance judgements Ilya Markov Information Retrieval 46

47 document and judge whether it is relevant to a given query. Therefore, the pooling strategy as introduced in Section 1 was used [35]. Many research papers [97, 101, 131, 133] have been published using the three tasks on the Gov corpus as their experimental platform. Current trends in IR Historical LTR datasets TREC 2003, 2004 Web IR track Gov The corpus OHSUMED with 1, 053, corpus 110 pages Tasks The OHSUMED TD topic corpus distillation [64] is a subset of MEDLINE, a database on medical HP publications. homepageit finding consists of 348,566 records (out of over 7 million) from NP 270 named medical page finding journals during the years of Table 6.1 Number of queries in TREC web track. 4 Task TREC2003 TREC2004 Topic distillation Homepage finding Named page finding Figure: Number of queries T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov i.markov@uva.nl Information Retrieval 47

48 Results on NP2003 ListNet AdaRank SVM map le 6.7 Results on NP2003. Algorithm MAP Regression RankSVM RankBoost FRank ListNet AdaRank SVM map tp://svmrank.yisongyue.com/svmmap.php T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 48

49 FRank ListNet AdaRank SVM map Current trends in IR Results on HP2004 ble 6.10 Results on HP2004. Algorithm MAP Regression RankSVM RankBoost FRank ListNet AdaRank SVM map ble 6.11 Results on OHSUMED. Algorithm T.-Y. Liu, Learning to Rank for Information Retrieval MAP Regression RankSVM RankBoost FRank Ilya Markov Information Retrieval 49

50 Experimental comparison Listwise ranking algorithms perform very well on most datasets ListNet seems to be better than the others Pairwise ranking algorithms obtain good ranking accuracy on some (although not all) datasets Linear regression performs worse than the pairwise and listwise ranking algorithms T.-Y. Liu, Learning to Rank for Information Retrieval Ilya Markov Information Retrieval 50

51 Outline 2 Machine learning Features LTR approaches Experimental comparison Summary Ilya Markov i.markov@uva.nl Information Retrieval 51

52 summary Features Content-based Link-based User-based Approaches Pointwise (regression, classification, ordinal regression) Pairwise (RankNet) Listwise (LambdaRank, LambdaMART) Ilya Markov Information Retrieval 52

53 Materials Tie-Yan Liu Learning to Rank for Information Retrieval Foundations and Trends in Information Retrieval, 2009 Christopher J.C. Burges From RankNet to LambdaRank to LambdaMART: An Overview Microsoft Research Technical Report, 2010 Ilya Markov Information Retrieval 53

54 Course overview Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 54

55 See you tomorrow! Offline Data Acquisition Data Processing Data Storage Online Query Processing Ranking Evaluation Advanced Aggregated Search Click Models Present and Future of IR Ilya Markov Information Retrieval 55

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

Search Engines and Learning to Rank

Search Engines and Learning to Rank Search Engines and Learning to Rank Joseph (Yossi) Keshet Query processor Ranker Cache Forward index Inverted index Link analyzer Indexer Parser Web graph Crawler Representations TF-IDF To get an effective

More information

Fall Lecture 16: Learning-to-rank

Fall Lecture 16: Learning-to-rank Fall 2016 CS646: Information Retrieval Lecture 16: Learning-to-rank Jiepu Jiang University of Massachusetts Amherst 2016/11/2 Credit: some materials are from Christopher D. Manning, James Allan, and Honglin

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Ranking with Query-Dependent Loss for Web Search

Ranking with Query-Dependent Loss for Web Search Ranking with Query-Dependent Loss for Web Search Jiang Bian 1, Tie-Yan Liu 2, Tao Qin 2, Hongyuan Zha 1 Georgia Institute of Technology 1 Microsoft Research Asia 2 Outline Motivation Incorporating Query

More information

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan, Learning to Rank Tie-Yan Liu Microsoft Research Asia CCIR 2011, Jinan, 2011.10 History of Web Search Search engines powered by link analysis Traditional text retrieval engines 2011/10/22 Tie-Yan Liu @

More information

arxiv: v1 [stat.ap] 14 Mar 2018

arxiv: v1 [stat.ap] 14 Mar 2018 arxiv:1803.05127v1 [stat.ap] 14 Mar 2018 Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets Sen LEI, Xinzhi HAN Submitted for the PSTAT 231 (Fall 2017) Final Project ONLY University

More information

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1 LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1 1 Microsoft Research Asia, No.49 Zhichun Road, Haidian

More information

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang Learning to Rank from heuristics to theoretic approaches Hongning Wang Congratulations Job Offer from Bing Core Ranking team Design the ranking module for Bing.com CS 6501: Information Retrieval 2 How

More information

One-Pass Ranking Models for Low-Latency Product Recommendations

One-Pass Ranking Models for Low-Latency Product Recommendations One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team,

More information

Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER

Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER Chalmers University of Technology University of Gothenburg

More information

A Few Things to Know about Machine Learning for Web Search

A Few Things to Know about Machine Learning for Web Search AIRS 2012 Tianjin, China Dec. 19, 2012 A Few Things to Know about Machine Learning for Web Search Hang Li Noah s Ark Lab Huawei Technologies Talk Outline My projects at MSRA Some conclusions from our research

More information

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia Learning to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/20/2008 Tie-Yan Liu @ Tutorial at WWW 2008 1 The Speaker Tie-Yan Liu Lead Researcher, Microsoft Research

More information

UMass at TREC 2017 Common Core Track

UMass at TREC 2017 Common Core Track UMass at TREC 2017 Common Core Track Qingyao Ai, Hamed Zamani, Stephen Harding, Shahrzad Naseri, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval College of Information and Computer

More information

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Learning to Rank for Faceted Search Bridging the gap between theory and practice Learning to Rank for Faceted Search Bridging the gap between theory and practice Agnes van Belle @ Berlin Buzzwords 2017 Job-to-person search system Generated query Match indicator Faceted search Multiple

More information

arxiv: v1 [cs.ir] 16 Oct 2017

arxiv: v1 [cs.ir] 16 Oct 2017 DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, Xueqi Cheng pl8787@gmail.com,{lanyanyan,guojiafeng,junxu,cxq}@ict.ac.cn,xujingfang@sogou-inc.com

More information

Northeastern University in TREC 2009 Million Query Track

Northeastern University in TREC 2009 Million Query Track Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson Microsoft Research Search = Modeling User Behavior

More information

A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising

A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising ABSTRACT Maryam Karimzadehgan Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL

More information

Linking Entities in Tweets to Wikipedia Knowledge Base

Linking Entities in Tweets to Wikipedia Knowledge Base Linking Entities in Tweets to Wikipedia Knowledge Base Xianqi Zou, Chengjie Sun, Yaming Sun, Bingquan Liu, and Lei Lin School of Computer Science and Technology Harbin Institute of Technology, China {xqzou,cjsun,ymsun,liubq,linl}@insun.hit.edu.cn

More information

Entity and Knowledge Base-oriented Information Retrieval

Entity and Knowledge Base-oriented Information Retrieval Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061

More information

Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank

Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi. Retrieval Effectiveness and Learning to Rank Arama Motoru Gelistirme Dongusu: Siralamayi Ogrenme ve Bilgiye Erisimin Degerlendirilmesi etrieval Effectiveness and Learning to ank EMIE YILMAZ Professor and Turing Fellow University College London esearch

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

RankDE: Learning a Ranking Function for Information Retrieval using Differential Evolution

RankDE: Learning a Ranking Function for Information Retrieval using Differential Evolution RankDE: Learning a Ranking Function for Information Retrieval using Differential Evolution Danushka Bollegala 1 Nasimul Noman 1 Hitoshi Iba 1 1 The University of Tokyo Abstract: Learning a ranking function

More information

Apache Solr Learning to Rank FTW!

Apache Solr Learning to Rank FTW! Apache Solr Learning to Rank FTW! Berlin Buzzwords 2017 June 12, 2017 Diego Ceccarelli Software Engineer, News Search dceccarelli4@bloomberg.net Michael Nilsson Software Engineer, Unified Search mnilsson23@bloomberg.net

More information

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction in in Florida State University November 17, 2017 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms

More information

Learning Non-linear Ranking Functions for Web Search using Probabilistic Model Building GP

Learning Non-linear Ranking Functions for Web Search using Probabilistic Model Building GP Learning Non-linear Ranking Functions for Web Search using Probabilistic Model Building GP Hiroyuki Sato, Danushka Bollegala, Yoshihiko Hasegawa and Hitoshi Iba The University of Tokyo, Tokyo, Japan 113-8654

More information

Learning to Rank: A New Technology for Text Processing

Learning to Rank: A New Technology for Text Processing TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM

More information

Structured Ranking Learning using Cumulative Distribution Networks

Structured Ranking Learning using Cumulative Distribution Networks Structured Ranking Learning using Cumulative Distribution Networks Jim C. Huang Probabilistic and Statistical Inference Group University of Toronto Toronto, ON, Canada M5S 3G4 jim@psi.toronto.edu Brendan

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

Learning to Rank Only Using Training Data from Related Domain

Learning to Rank Only Using Training Data from Related Domain Learning to Rank Only Using Training Data from Related Domain Wei Gao, Peng Cai 2, Kam-Fai Wong, and Aoying Zhou 2 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China {wgao, kfwong}@se.cuhk.edu.hk

More information

Learning to Rank with Deep Neural Networks

Learning to Rank with Deep Neural Networks Learning to Rank with Deep Neural Networks Dissertation presented by Goeric HUYBRECHTS for obtaining the Master s degree in Computer Science and Engineering Options: Artificial Intelligence Computing and

More information

Information Retrieval

Information Retrieval Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University

More information

for Searching Social Media Posts

for Searching Social Media Posts Mining the Temporal Statistics of Query Terms for Searching Social Media Posts ICTIR 17 Amsterdam Oct. 1 st 2017 Jinfeng Rao Ferhan Ture Xing Niu Jimmy Lin Task: Ad-hoc Search on Social Media domain Stream

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

Automatic Summarization

Automatic Summarization Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

More information

Learning to Rank for Information Retrieval

Learning to Rank for Information Retrieval Learning to Rank for Information Retrieval Tie-Yan Liu Learning to Rank for Information Retrieval Tie-Yan Liu Microsoft Research Asia Bldg #2, No. 5, Dan Ling Street Haidian District Beijing 100080 People

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Learning Temporal-Dependent Ranking Models

Learning Temporal-Dependent Ranking Models Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,

More information

Predicting Query Performance on the Web

Predicting Query Performance on the Web Predicting Query Performance on the Web No Author Given Abstract. Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However,

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Performance Measures for Multi-Graded Relevance

Performance Measures for Multi-Graded Relevance Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de

More information

Graph mining assisted semi-supervised learning for fraudulent cash-out detection

Graph mining assisted semi-supervised learning for fraudulent cash-out detection Graph mining assisted semi-supervised learning for fraudulent cash-out detection Yuan Li Yiheng Sun Noshir Contractor Aug 2, 2017 Outline Introduction Method Experiments and Results Conculsion and Future

More information

Collective classification in network data

Collective classification in network data 1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods

More information

Advanced Click Models & their Applications to IR

Advanced Click Models & their Applications to IR Advanced Click Models & their Applications to IR (Afternoon block 1) Aleksandr Chuklin, Ilya Markov Maarten de Rijke a.chuklin@uva.nl i.markov@uva.nl derijke@uva.nl University of Amsterdam Google Switzerland

More information

Lecture 3: Improving Ranking with

Lecture 3: Improving Ranking with Modeling User Behavior and Interactions Lecture 3: Improving Ranking with Behavior Data 1 * &!" ) & $ & 6+ $ & 6+ + "& & 8 > " + & 2 Lecture 3 Plan $ ( ' # $ #! & $ #% #"! #! -( #", # + 4 / 0 3 21 0 /

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data Otávio D. A. Alcântara 1, Álvaro R. Pereira Jr. 3, Humberto M. de Almeida 1, Marcos A. Gonçalves 1, Christian Middleton

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap

More information

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

End-to-End Neural Ad-hoc Ranking with Kernel Pooling End-to-End Neural Ad-hoc Ranking with Kernel Pooling Chenyan Xiong 1,Zhuyun Dai 1, Jamie Callan 1, Zhiyuan Liu, and Russell Power 3 1 :Language Technologies Institute, Carnegie Mellon University :Tsinghua

More information

RSDC 09: Tag Recommendation Using Keywords and Association Rules

RSDC 09: Tag Recommendation Using Keywords and Association Rules RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA

More information

Learning to Reweight Terms with Distributed Representations

Learning to Reweight Terms with Distributed Representations Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215 Outline Goal: Assign weights to query terms for better retrieval results

More information

Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

Balancing Speed and Quality in Online Learning to Rank for Information Retrieval Balancing Speed and Quality in Online Learning to Rank for Information Retrieval ABSTRACT Harrie Oosterhuis University of Amsterdam Amsterdam, The Netherlands oosterhuis@uva.nl In Online Learning to Rank

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

A General Approximation Framework for Direct Optimization of Information Retrieval Measures

A General Approximation Framework for Direct Optimization of Information Retrieval Measures A General Approximation Framework for Direct Optimization of Information Retrieval Measures Tao Qin, Tie-Yan Liu, Hang Li October, 2008 Abstract Recently direct optimization of information retrieval (IR)

More information

IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events

IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events Sreekanth Madisetty and Maunendra Sankar Desarkar Department of CSE, IIT Hyderabad, Hyderabad, India {cs15resch11006, maunendra}@iith.ac.in

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

arxiv: v1 [cs.ir] 19 Sep 2016

arxiv: v1 [cs.ir] 19 Sep 2016 Enhancing LambdaMART Using Oblivious Trees Marek Modrý 1 and Michal Ferov 2 arxiv:1609.05610v1 [cs.ir] 19 Sep 2016 1 Seznam.cz, Radlická 3294/10, 150 00 Praha 5, Czech Republic marek.modry@firma.seznam.cz

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016 CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:

More information

Opinions in Federated Search: University of Lugano at TREC 2014 Federated Web Search Track

Opinions in Federated Search: University of Lugano at TREC 2014 Federated Web Search Track Opinions in Federated Search: University of Lugano at TREC 2014 Federated Web Search Track Anastasia Giachanou 1,IlyaMarkov 2 and Fabio Crestani 1 1 Faculty of Informatics, University of Lugano, Switzerland

More information

A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank

A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank A Comparing Pointwise and Listwise Objective Functions for Random Forest based Learning-to-Rank MUHAMMAD IBRAHIM, Monash University, Australia MARK CARMAN, Monash University, Australia Current random forest

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond Traditional

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections

On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections Pengfei Li RMIT University, Australia li.pengfei@rmit.edu.au Mark Sanderson RMIT University, Australia mark.sanderson@rmit.edu.au

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search

A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search Vinod Nair Yahoo! Labs Bangalore vnair@yahoo-inc.com Dhruv Mahajan Yahoo! Labs Bangalore dkm@yahoo-inc.com

More information

Adapting Ranking Functions to User Preference

Adapting Ranking Functions to User Preference Adapting Ranking Functions to User Preference Keke Chen, Ya Zhang, Zhaohui Zheng, Hongyuan Zha, Gordon Sun Yahoo! {kchen,yazhang,zhaohui,zha,gzsun}@yahoo-inc.edu Abstract Learning to rank has become a

More information

A Machine Learning Approach for Improved BM25 Retrieval

A Machine Learning Approach for Improved BM25 Retrieval A Machine Learning Approach for Improved BM25 Retrieval Krysta M. Svore and Christopher J. C. Burges Microsoft Research One Microsoft Way Redmond, WA 98052 {ksvore,cburges}@microsoft.com Microsoft Research

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Information Retrieval May 15. Web retrieval

Information Retrieval May 15. Web retrieval Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Automatically Building Research Reading Lists

Automatically Building Research Reading Lists Automatically Building Research Reading Lists Michael D. Ekstrand 1 Praveen Kanaan 1 James A. Stemper 2 John T. Butler 2 Joseph A. Konstan 1 John T. Riedl 1 ekstrand@cs.umn.edu 1 GroupLens Research Department

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

LEARNING to rank is a kind of learning based information

LEARNING to rank is a kind of learning based information IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. X, MARCH 2010 1 Ranking Model Adaptation for Domain-Specific Search Bo Geng, Member, IEEE, Linjun Yang, Member, IEEE, Chao Xu, Xian-Sheng

More information

Ranking Algorithms For Digital Forensic String Search Hits

Ranking Algorithms For Digital Forensic String Search Hits DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information