Advances on the Development of Evaluation Measures. Ben Carterette Evangelos Kanoulas Emine Yilmaz

Size: px
Start display at page:

Download "Advances on the Development of Evaluation Measures. Ben Carterette Evangelos Kanoulas Emine Yilmaz"

Transcription

1 Advances on the Development of Evaluation Measures Ben Carterette Evangelos Kanoulas Emine Yilmaz

2 Information Retrieval Systems Match information seekers with the information they seek

3 Why is Evaluation so Important? What you can t measure you can t improve Lord Kelvin Most retrieval systems are tuned to optimize for an objective evaluation metric 3

4 Outline Intro to evaluation Different approaches to evaluation Traditional evaluation measures User model based evaluation measures Session Evaluation Novelty and Diversity 4

5 Online Evaluation Click/Noclick Evaluate Design interactive experiments Use users actions to evaluate the quality 5

6 Online Evaluation Standard click metrics Clickthrough rate Queries per user Probability user skips over results they have considered (pskip) Result interleaving

7 What is result interleaving? A way to compare rankers online Given the two rankings produced by two methods Present a combination of the rankings to users Result interleaving Credit assignment based on clicks

8 Team Draft Interleaving (Radlinski et al., 2008) Interleaving two rankings Input: Two rankings Repeat: Toss a coin to see which team picks next Winner picks their best remaining player Loser picks their best remaining player Output: One ranking Credit assignment Ranking providing more of the clicked results wins

9 Team Draft Interleaving Ranking A 1. Napa Valley The authority for lodging Napa Valley Wineries - Plan your wine Napa Valley College 4. Been There Tips Napa Valley 5. Napa Valley Wineries and Wine Napa Country, California Wikipedia Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/napa_valley 2. Napa Valley The authority for lodging Napa: The Story of an American Eden... books.google.co.uk/books?isbn= Napa Valley Hotels Bed and Breakfast... Presented Ranking 1. Napa Valley The authority 5. for NapaValley.org lodging Napa Country, California 6. Wikipedia The Napa Valley Marathon en.wikipedia.org/wiki/napa_valley en.wikipedia.org/wiki/napa_valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn= Napa Valley Wineries Plan your wine Napa Valley Hotels Bed and Breakfast... A B 6. Napa Valley College 7 NapaValley.org

10 Team Draft Interleaving Ranking A 1. Napa Valley The authority for lodging Napa Valley Wineries - Plan your wine Napa Valley College 4. Been There Tips Napa Valley 5. Napa Valley Wineries and Wine Napa Country, California Wikipedia Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/napa_valley 2. Napa Valley The authority for lodging Napa: The Story of an American Eden... books.google.co.uk/books?isbn= Napa Valley Hotels Bed and Breakfast... Presented Ranking 1. Napa Valley The authority 5. for NapaValley.org lodging Napa Country, California 6. Wikipedia The Napa Valley Marathon en.wikipedia.org/wiki/napa_valley en.wikipedia.org/wiki/napa_valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn= Napa Valley Wineries Plan your wine Napa Valley Hotels Bed and Breakfast Napa Valley College 7 NapaValley.org B wins!

11 Offline Evaluation Controlled laboratory experiments The user s interaction with the engine is only simulated Ask experts to judge each query result Predict how users behave when they search Aggregate judgments to evaluate 11

12 Offline Evaluation Documents Judge User model Evaluate Ask experts to judge each query result Predict how users behave when they search Aggregate judgments to evaluate 12

13 Online vs. Offline Evaluation Online Offline Pros Cheap Measure actual user reactions Fast to evaluate Easy to try new ideas Portable Cons Need to go live Noisy Slow Not duplicable Needs ground truth Slow to obtain judgments Expensive Inconsistent Difficult to model how users behave 13

14 Outline Intro to evaluation Different approaches to evaluation Traditional evaluation measures User model based evaluation measures Session Evaluation Novelty and Diversity 14

15 Traditional Experiment Results Search Engines Judges How many good docs have I missed/found? 15

16 Depth-k Pooling sys 1 sys 2 sys 3 sys M 1 2 A B C D A E B A Documents Judge 3 k C. X. M. A. D. F..... S. Z. z Y T B L 16

17 Depth-k Pooling z k sys 2... C D M A T sys 3... A E D F B sys M.. B A S Z L sys 1.. A B C X Y Judge 17

18 Depth-k Pooling z k sys 2.. C D M A T sys 3.. A E D F B sys M.. B A S Z L sys 1.. A B C X Y z k sys 2.. N N R R? sys 3.. R N N R R sys M... R R N N? sys 1... R R N N? Judge 18

19 Depth-k Pooling z k sys 2.. C D M A T sys 3.. A E D F B sys M.. B A S Z L sys 1.. A B C X Y z k sys 2.. N N R R N sys 3.. R N N R R sys M... R R N N N sys 1... R R N N N Judge 19

20 Reusable Test Collections Document Corpus Topics Topic 1 Topic 2 Topic N Relevance Judgments 20

21 Evaluation Metrics: Precision vs Recall Retrieved list R N R N N R N N N R.

22 Visualizing Retrieval Performance: Precision-Recall Curves List: R N R N N R N N N R

23 Evaluation Metrics: Average Precision List: R N R N N R N N N R

24 Outline Intro to evaluation Different approaches to evaluation Traditional evaluation measures User model based evaluation measures Session Evaluation Novelty and Diversity 24

25 User models Behind Traditional Metrics Users always look at top k documents What fraction of the top k documents are relevant? Recall Users would like to find all the relevant documents. What fraction of these documents have been retrieved by the search engine?

26 User Model of Average Precision (Robertson 08) 1. User steps down a ranked list one-by-one 2. Stops browsing documents due to satisfaction stops with a certain probability after observing a relevant document 3. Gains utility from each relevant document

27 User Model of Average Precision (Robertson 08) The probability that the user stops browsing is uniform over all the relevant documents The utility a user gains when he stops browsing at a relevant document at rank n (precision at rank n) AP can be written as: 1 P( n) if doc is relevant, 0 o.w. R U(n) 1 n n k 1 rel(k) AP P( n) U( n) n 1

28 User Model Based Evaluation Measures Directly aim at evaluating user satisfaction An effectiveness measure should be correlated to the user s experience Thus interest in effectiveness measures based on explicit models of user interaction Devise a user model correlated with user behavior Infer an evaluation metric from the user model

29 Basic User Model Simple model of user interaction: 1. User steps down ranked results one-by-one 2. Stops at a document at rank k with some probability P(k) 3. Gains some utility U(k) from relevant documents M k 1 U(k)P(k)

30 Basic User Model 1. Discount: What is the chance a user will visit a document? Model of the browsing behavior 2. Utility: What does the user gain by visiting a document?

31 Model Browsing Behavior black powder ammunition Position-based models The chance of observing a document depends on the position it is presented in the ranked list.

32 Rank Biased Precision black powder ammunition Query View Next Item Stop

33 Rank Biased Precision black powder ammunition RBP = i=1 rel i i 1 (1- )

34 Discounted Cumulative Gain black powder ammunition Relevance HR R N N HR R N R N N Relevance Score rel 2 r 1 Gain Discount by rank 1/log 2 (r+1) Discounted Gain

35 Discounted Cumulative Gain DCG can be written as: N P( user visits doc r) Utility r 1 Discount function models the probability that the user visits (clicks on) the document at rank r Currently, P(user clicks on doc r) = 1/log 2 (r+1) r

36 Discounted Cumulative Gain Instead of stopping probability, think about viewing probability This fits in discounted gain model framework:

37 Normalised Discounted Cumulative black powder ammunition Relevance HR R N N HR R N R N N Relevance Score rel 2 r 1 Gain Gain Discount by rank 1/log 2 (r+1) Discounted Gain NDCG DCG optdcg

38 Model Browsing Behavior black powder ammunition Cascade-based models The user views search results from top to bottom At each rank i, the user has a certain probability of being satisfied. Probability of satisfaction proportional to the relevance grade of the document at rank i. Once the user is satisfied with a document, he terminates the search.

39 Rank Biased Precision black powder ammunition Query View Next Item Stop

40 Expected Reciprocal Rank [Chapelle et al CIKM09] black powder ammunition Query View Next Item Relevant? highly somewhat no 10 Stop

41 black powder ammunition Expected Reciprocal Rank [Chapelle et al CIKM09] g r (r) : Utility of finding " the perfect document" at rank r (r) 1/r ERR ERR n r 1 n r 1 1 P(user stopsat position r) r : relevance grade of Prob. of relevance of doc r R 1 1 r r i 1 r (1 R ) R i the r gr 2 2 r g th document 1 P(user stops at position r) max

42 Metrics derived from Query Logs Use the query logs to understand how users behave Learn the parameters of the user model from the query logs Utility, discount, etc.

43 Metrics derived from Query Logs Users tend to stop search if they are satisfied or frustrated Relevance P(observe a doc at rank r) highly affected by snippet quality Relevance P(Stop R) Bad 0.49 Fair 0.41 Good 0.37 Excellent 0.53 Perfect 0.76 P(C R) Bad 0.50 Fair 0.49 Good 0.45 Excellent 0.59 Perfect 0.79

44 Metrics derived from Query Logs Users behave differently for different queries Informational queries Navigational queries Navigational Informational P(C R) P(Stop R) P(C R) P(Stop R) Bad Fair Good Excellent Perfect

45 Expected Browsing Utility (Yilmaz et al. CIKM 10) D EBU (r) P(E r ) P(C R r ) EBU n r 1 D EBU (r) R r

46 Basic User Model 1. Discount: What is the chance a user will visit a document? Model of the browsing behavior 2. Utility: What does the user gain by visiting a document? Mostly ad-hoc, no clear user model

47 Graded Average Precision (Robertson et al. SIGIR 10) One document is more useful than another One possible meaning: one document is useful to more users than another Hence the following: assume grades of relevance but that user has a threshold relevance grade which defines a binary view different users have different thresholds described by a probability distribution over users

48 Graded Average Precision [Robertson et al. SIGIR10] User has binary view of relevance by thresholding the relevance scale Relevance Scale Highly Relevant Relevant Considered relevant with probability g 1 Irrelevant

49 Graded Average Precision [Robertson et al. SIGIR10] User has binary view of relevance by thresholding the relevance scale Relevance Scale Highly Relevant Relevant Considered relevant with probability g 2 Irrelevant

50 Graded Average Precision Assume relevance grades {0...c} 0 for non-relevant, + c positive grades g i = P(user threshold is at i) for i {1...c} i.e. user regards grades {i...c} as relevant, grades {0...(i-1)} as not relevant g i s sum to one Step down the ranked list, stopping at documents that may be relevant then calculate expected precision at each of these (expected over the population of users)

51 Graded Average Precision (GAP) Relevance 1 HR 2 R 3 N 4 N 5 R 6 HR 7 R

52 Graded Average Precision (GAP) Relevance 1 HR 1 Rel 2 R 2 Rel 3 N 4 N 5 R 6 HR 7 R with prob. g 1 3 N 4 N 5 Rel 6 Rel 7 Rel prec 6 4 6

53 Graded Average Precision (GAP) Relevance 1 HR 1 Rel 2 R 2 N 3 N 4 N 5 R 6 HR 7 R with prob. g 2 3 N 4 N 5 N 6 Rel 7 N prec 6 2 6

54 Graded Average Precision (GAP) Relevance 1 HR 2 R 3 NR 4 NR 5 R 6 HR 7 R wprec g g 2

55 Probability Models Almost all the measures we ve discussed are based on probabilistic models of users Most have one or more parameters representing something about user behavior Is there a way to incorporate variability in the user population? How do we estimate parameter values? Is a single point estimate good enough?

56 Choosing Parameter Values Parameter θ models a user Higher θ more patience, more results viewed Lower θ less patience, fewer results viewed Different approaches: Minimize variance in evaluation (Kanoulas & Aslam, CIKM 09) Use click log; fit a model to gaps between clicks (Zhang et al., IRJ, 2010) All try to infer a single value for the parameters

57 Distribution of Patience for RBP Form a distribution P(θ) Sampling from P(θ) is like sampling a user defined by their patience How can we form a proper distribution of θ? Idea: mine logged search engine user data Look at ranks users are clicking Estimate patience based on absence or presence of clicks

58 Modeling Patience from Log Data We will assume we have a flat prior θ that we want to update using log data L Decompose L into individual search sessions For each session q, count: c q, the total number of clicks r q, the total number of no-clicks Model c q with a negative binomial distribution conditional on r q and θ:

59 Modeling Patience from Log Data Marginalize P(θ L) over r: Apply Bayes rule to P(θ r, L): P(L θ, r) is the likelihood of the observed clicks

60 Complete Model Expression Model components result in three equations to estimate P(θ L)

61 Empirical Patience Profiles: Navigational Queries

62 Empirical Patience Profiles: Informational Queries

63 Extend to ERR Parameters

64 Evaluation Using Parameter Distributions Monte Carlo procedure: Sample a parameter value from P(θ L) Or a vector of values for ERR Compute the measure with the sampled value Iterate to form distribution P(RBP) or P(ERR)

65 Marginal Distribution Analysis S 1 =[R N N N N N N N N N] S 2 =[N R R R R R R R R R]

66 Distribution of RBP

67 Distribution of ERR

68 Marginal Distribution Analysis Given two systems, over all choices of θ What is P(M 1 > M 2 )? What is P((M 1 - M 2 )>t)?

69 Marginal Distribution Analysis

70 Outline Intro to evaluation Different approaches to evaluation Traditional evaluation measures User model based evaluation measures Session Evaluation Novelty and Diversity 71

71 Why sessions? Current evaluation framework Assesses the effectiveness of systems over oneshot queries Users reformulate their initial query Still fine if optimizing system for one-shot queries led to optimal performance over an entire session

72 Why sessions? When was the DuPont Science Essay Contest created? Initial Query : DuPont Science Essay Contest Reformulation : When was the DSEC created? e.g. retrieval systems should accumulate information along a session

73 Paris Paris Luxurious J Hilton Lo Hotels Paris

74 Extend the evaluation framework From one query evaluation To multi-query sessions evaluation

75 Construct appropriate test collections Rethink of evaluation measures

76 Basic test collection A set of information needs A friend from Kenya is visiting you and you'd like to surprise him with by cooking a traditional swahili dish. You would like to search online to decide which dish you will cook at home. A static sequence of m queries Initial Query : 1 st Reformulation : 2 nd Reformulation : (m-1) th Reformulation : kenya cooking traditional kenya cooking traditional swahili kenya swahili traditional food recipes

77 Basic Test Collection Factual/Amorphous, Known-item search Intellectual/Amorphous, Explanatory search Factual/Amorphous, Known-item search

78 Experiment kenya cooking traditional kenya cooking traditional swahili kenya swahili traditional food recipes

79 Experiment kenya cooking traditional kenya cooking traditional swahili kenya swahili traditional food recipes

80 Construct appropriate test collections Rethink of evaluation measures

81 What is a good system?

82 How can we measure goodness?

83 Measuring goodness The user steps down a ranked list of documents and observes each one of them until a decision point and either a) abandons the search, or b) reformulates While stepping down or sideways, the user accumulates utility

84 What are the challenges?

85 Evaluation over a single ranked list kenya cooking traditional kenya cooking traditional swahili kenya swahili traditional food recipes

86

87 Session DCG [Järvelin et al ECIR 2008] kenya cooking traditional kenya cooking traditional swahili k r 1 2 rel(r) 1 log b (r b 1) 1 log c (1 c 1) DCG(RL1) 1 k r 1 2 rel(r) 1 log b (r b 1) log c (2 c 1) DCG(RL2)

88 Session Metrics Session DCG [Järvelin et al ECIR 2008] The user steps down the ranked list until rank k and reformulates [Deterministic; no early abandonment]

89 Model-based measures Probabilistic space of users following different paths Ω is the space of all paths P(ω) is the prob of a user following a path ω in Ω U(ω) is the utility of path ω in Ω P( )U( ) [Yang and Lad ICTIR 2009]

90 Expected Global Utility [Yang and Lad ICTIR 2009] 1. User steps down ranked results one-by-one 2. Stops browsing documents based on a stochastic process that defines a stopping probability distribution over ranks and reformulates 3. Gains something from relevant documents, accumulating utility

91 Expected Global Utility [Yang and Lad ICTIR 2009] The probability of a user following a path ω: P(ω) = P(r 1, r 2,..., r K ) r i is the stopping and reformulation point in list i Assumption: stopping positions in each list are independent P(r 1, r 2,..., r K ) = P(r 1 )P(r 2 )...P(r K ) Use geometric distribution (RBP) to model the stopping and reformulation behaviour P(r i = r) = (1- ) k 1

92 Geometric w/ parameter θ Expected Global Utility Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R

93 Session Metrics Session DCG [Järvelin et al ECIR 2008] The user steps down the ranked list until rank k and reformulates [Deterministic; no early abandonment] Expected global utility [Yang and Lad ICTIR 2009] The user steps down a ranked list of documents until a decision point and reformulates [Stochastic; no early abandonment]

94 Model-based measures Probabilistic space of users following different paths Ω is the space of all paths P(ω) is the prob of a user following a path ω in Ω M ω is a measure over a path ω esm P( )M [Kanoulas et al. SIGIR

95 Probability of a path Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (1) (2) Probability of abandoning at reform 2 X Probability of reformulating at rank 3

96 Geometric w/ parameter p reform Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (1) Probability of abandoning the session at reformulation i

97 Truncated Geometric w/ parameter p reform Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (1) Probability of abandoning the session at reformulation i

98 Truncated Geometric w/ parameter p reform Geometric w/ parameter p down Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (2) Probability of reformulating at rank j

99 Session Metrics Session DCG [Järvelin et al ECIR 2008] The user steps down the ranked list until rank k and reformulates [Deterministic; no early abandonment] Expected global utility [Yang and Lad ICTIR 2009] The user steps down a ranked list of documents until a decision point and reformulates [Stochastic; no early abandonment] Expected session measures [Kanoulas et al. SIGIR 2011] The user steps down a ranked list of documents until a decision point and either abandons the query or reformulates [Stochastic; allows early abandonment]

100 Outline Intro to evaluation Different approaches to evaluation Traditional evaluation measures User model based evaluation measures Session Evaluation Novelty and Diversity 101

101 Novelty The redundancy problem: the first relevant document contains some useful information every document with the same information after that is worth less to the user but worth the same to traditional evaluation measures Novelty retrieval attempts to ensure that ranked results do not have much redundancy

102 Example query: oil-producing nations members of OPEC North Atlantic nations South American nations 10 relevant articles about OPEC probably not as useful as one relevant article about each group And one relevant article about all oil-producing nations might be even better

103 How to Evaluate? One approach: List subtopics, aspects, or facets of the topic Judge each document relevant or not to each possible subtopic For oil-producing nations, subtopics could be names of nations Saudi Arabia, Russia, Canada,

104 Subtopic Relevance Example

105 Evaluation Measures Subtopic recall and precision (Zhai et al., 2003) Subtopic recall at rank k: Count unique subtopics in top k documents Divide by total number of known unique subtopics Subtopic precision at recall r: Find least k at which subtopic recall r is achieved Find least k at which subtopic recall r could possibly be achieved (by a perfect system) Divide latter by former Models a user that wants all subtopics and doesn t care about redundancy as long as they are seeing new information

106 Subtopic Relevance Evaluation Copyright Ben Carterette

107 Diversity Short keyword queries are inherently ambiguous An automatic system can never know the user s intent Diversification attempts to retrieve results that may be relevant to a space of possible intents

108 Evaluation Measures Subtopic recall and precision This time with judgments to intents rather than subtopics Measures that know about intents: Intent-aware family of measures (Agrawal et al.) D, D measures (Sakai et al.) α-ndcg (Clarke et al.) ERR-IA (Chapelle et al.)

109 Intent-Aware Measures Assume there is a probability distribution P(i Q) over intents for a query Q Probability that a randomly-sampled user means intent i when submitting query Q The intent-aware version of a measure is its weighted average over this distribution

110 = 0.35* * * * *0.1 = 0.23

111 D-measure Take the idea of intent-awareness and apply it to computing document gain The gain for a document is the (weighted) average of its gains for subtopics it is relevant to D-nDCG is ndcg computed using intent-aware gains

112 D-DCG = 0.35/log /log 3 +

113 α-ndcg α-ndcg is a generalization of ndcg that accounts for both novelty and diversity α is a geometric penalization for redundancy Redefine the gain of a document: +1 for each subtopic it is relevant to (1-α) for each document higher in the ranking that subtopic already appeared in Discount is the same as usual

114 (1-α) +1 (1-α) (1-α) (1-α) 2 (1-α) 2

115 ERR-IA Intent-aware version of ERR But it has appealing properties other IA measures do not have: ranges between 0 and 1 submodularity: diminishing returns for relevance to a given subtopic -> built-in redundancy penalization Also has appealing properties over α-ndcg: Easily handles graded subtopic judgments Easily handles intent distributions

116 Granularity of Judging What exactly is a subtopic? Perhaps any piece of information a user may be interested in finding? At what granularity should subtopics be defined? For example: cardinals has many possible meanings cardinals baseball team is still very broad cardinals baseball team schedule covers 6 months cardinals baseball team schedule august covers ~25 games cardinals baseball team schedule august 12 th

117 Preference Judgments for Novelty What about evaluating novelty with no subtopic judgments? Preference judgments: Is document A more relevant than document B? Conditional preference judgments: Is document A better than document B given that I ve just seen document C? Assumption: preference is based on novelty over C Is it true? Come to our presentation on Wednesday

118

119 Conclusions Strong interest in using evaluation measures to model user behavior and satisfaction Driven by availability of user logs, increased computational power, good abstract models DCG, RBP, ERR, EBU, session measures, diversity measures all model users in different ways Cranfield-style evaluation is still important! But there is still much to understand about users and how they derive satisfaction

120 Conclusions Ongoing and future work: Models with more degrees of freedom Direct simulation of users from start of session to finish Application to other domains Thank you! Slides will be available online

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Evaluation Rank-Based Measures Binary relevance Precision@K (P@K) Mean Average Precision (MAP) Mean Reciprocal Rank (MRR) Multiple levels of relevance Normalized Discounted

More information

A Comparative Analysis of Cascade Measures for Novelty and Diversity

A Comparative Analysis of Cascade Measures for Novelty and Diversity A Comparative Analysis of Cascade Measures for Novelty and Diversity Charles Clarke, University of Waterloo Nick Craswell, Microsoft Ian Soboroff, NIST Azin Ashkan, University of Waterloo Background Measuring

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning, Pandu Nayak and Prabhakar Raghavan Evaluation 1 Situation Thanks to your stellar performance in CS276, you

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

Overview of the TREC 2013 Crowdsourcing Track

Overview of the TREC 2013 Crowdsourcing Track Overview of the TREC 2013 Crowdsourcing Track Mark D. Smucker 1, Gabriella Kazai 2, and Matthew Lease 3 1 Department of Management Sciences, University of Waterloo 2 Microsoft Research, Cambridge, UK 3

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

A Simple and Efficient Sampling Method for Es7ma7ng AP and ndcg

A Simple and Efficient Sampling Method for Es7ma7ng AP and ndcg A Simple and Efficient Sampling Method for Es7ma7ng AP and ndcg Emine Yilmaz Microso' Research, Cambridge, UK Evangelos Kanoulas Javed Aslam Northeastern University, Boston, USA Introduc7on Obtaining relevance

More information

5. Novelty & Diversity

5. Novelty & Diversity 5. Novelty & Diversity Outline 5.1. Why Novelty & Diversity? 5.2. Probability Ranking Principled Revisited 5.3. Implicit Diversification 5.4. Explicit Diversification 5.5. Evaluating Novelty & Diversity

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

On Duplicate Results in a Search Session

On Duplicate Results in a Search Session On Duplicate Results in a Search Session Jiepu Jiang Daqing He Shuguang Han School of Information Sciences University of Pittsburgh jiepu.jiang@gmail.com dah44@pitt.edu shh69@pitt.edu ABSTRACT In this

More information

Modeling Expected Utility of Multi-session Information Distillation

Modeling Expected Utility of Multi-session Information Distillation Modeling Expected Utility of Multi-session Information Distillation Yiming Yang and Abhimanyu Lad Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA. {yiming,alad}@cs.cmu.edu

More information

Large-Scale Validation and Analysis of Interleaved Search Evaluation

Large-Scale Validation and Analysis of Interleaved Search Evaluation Large-Scale Validation and Analysis of Interleaved Search Evaluation Olivier Chapelle, Thorsten Joachims Filip Radlinski, Yisong Yue Department of Computer Science Cornell University Decide between two

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

Increasing evaluation sensitivity to diversity

Increasing evaluation sensitivity to diversity Inf Retrieval (2013) 16:530 555 DOI 10.1007/s10791-012-9218-8 SEARCH INTENTS AND DIVERSIFICATION Increasing evaluation sensitivity to diversity Peter B. Golbus Javed A. Aslam Charles L. A. Clarke Received:

More information

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks Ben Carterette Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA 01003 carteret@cs.umass.edu

More information

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction in in Florida State University November 17, 2017 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India rroy@adobe.com 2014 Adobe Systems Incorporated. All Rights Reserved. 1 Introduction

More information

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using

More information

Overview of the NTCIR-13 OpenLiveQ Task

Overview of the NTCIR-13 OpenLiveQ Task Overview of the NTCIR-13 OpenLiveQ Task ABSTRACT Makoto P. Kato Kyoto University mpkato@acm.org Akiomi Nishida Yahoo Japan Corporation anishida@yahoo-corp.jp This is an overview of the NTCIR-13 OpenLiveQ

More information

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user EVALUATION Sec. 6.2 THIS LECTURE How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results usable to a user 2 3 EVALUATING

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Performance Criteria Evaluation of Retrieval Systems 1 1. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs

More information

Reducing Redundancy with Anchor Text and Spam Priors

Reducing Redundancy with Anchor Text and Spam Priors Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University

More information

Dealing with Incomplete Judgments in Cascade Measures

Dealing with Incomplete Judgments in Cascade Measures ICTIR 7, October 4, 27, Amsterdam, The Netherlands Dealing with Incomplete Judgments in Cascade Measures Kai Hui Max Planck Institute for Informatics Saarbrücken Graduate School of Computer Science Saarbrücken,

More information

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University Users are generally loyal to one engine Even when engine switching cost is low, and even when they are

More information

Retrieval Evaluation

Retrieval Evaluation Retrieval Evaluation - Reference Collections Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, Chapter

More information

Overview of the NTCIR-13 OpenLiveQ Task

Overview of the NTCIR-13 OpenLiveQ Task Overview of the NTCIR-13 OpenLiveQ Task Makoto P. Kato, Takehiro Yamamoto (Kyoto University), Sumio Fujita, Akiomi Nishida, Tomohiro Manabe (Yahoo Japan Corporation) Agenda Task Design (3 slides) Data

More information

Dynamic Embeddings for User Profiling in Twitter

Dynamic Embeddings for User Profiling in Twitter Dynamic Embeddings for User Profiling in Twitter Shangsong Liang 1, Xiangliang Zhang 1, Zhaochun Ren 2, Evangelos Kanoulas 3 1 KAUST, Saudi Arabia 2 JD.com, China 3 University of Amsterdam, The Netherlands

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

TREC 2017 Dynamic Domain Track Overview

TREC 2017 Dynamic Domain Track Overview TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Evaluation. David Kauchak cs160 Fall 2009 adapted from: Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For

More information

Information Retrieval

Information Retrieval Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection

More information

Part 7: Evaluation of IR Systems Francesco Ricci

Part 7: Evaluation of IR Systems Francesco Ricci Part 7: Evaluation of IR Systems Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 This lecture Sec. 6.2 p How

More information

Personalized Interactive Faceted Search

Personalized Interactive Faceted Search Personalized Interactive Faceted Search Jonathan Koren *, Yi Zhang *, and Xue Liu * University of California, Santa Cruz McGill University 0:00-0:20 Outline Introduce Faceted Search Identify Problems with

More information

Comparative Analysis of Clicks and Judgments for IR Evaluation

Comparative Analysis of Clicks and Judgments for IR Evaluation Comparative Analysis of Clicks and Judgments for IR Evaluation Jaap Kamps 1,3 Marijn Koolen 1 Andrew Trotman 2,3 1 University of Amsterdam, The Netherlands 2 University of Otago, New Zealand 3 INitiative

More information

Do User Preferences and Evaluation Measures Line Up?

Do User Preferences and Evaluation Measures Line Up? Do User Preferences and Evaluation Measures Line Up? Mark Sanderson, Monica Lestari Paramita, Paul Clough, Evangelos Kanoulas Department of Information Studies, University of Sheffield Regent Court, 211

More information

{david.vallet,

{david.vallet, Personalized Diversification of Search Results David Vallet and Pablo Castells Universidad Autónoma de Madrid Escuela Politécnica Superior, Departamento de Ingeniería Informática {david.vallet, pablo.castells}@uam.es

More information

Simulating Simple User Behavior for System Effectiveness Evaluation

Simulating Simple User Behavior for System Effectiveness Evaluation Simulating Simple User Behavior for System Effectiveness Evaluation Ben Carterette, Evangelos Kanoulas, Emine Yilmaz carteret@cis.udel.edu, e.kanoulas@shef.ac.uk, {eminey@microsoft.com, eyilmaz@ku.edu.tr}

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Yunqing Xia 1 and Sen Na 2 1 Tsinghua University 2 Canon Information Technology (Beijing) Co. Ltd. Before we start Who are we? THUIS is

More information

Northeastern University in TREC 2009 Web Track

Northeastern University in TREC 2009 Web Track Northeastern University in TREC 2009 Web Track Shahzad Rajput, Evangelos Kanoulas, Virgil Pavlu, Javed Aslam College of Computer and Information Science, Northeastern University, Boston, MA, USA Information

More information

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation Ahmed Hassan Microsoft Research Redmond, WA hassanam@microsoft.com Yang Song Microsoft Research

More information

Learning to Rank (part 2)

Learning to Rank (part 2) Learning to Rank (part 2) NESCAI 2008 Tutorial Filip Radlinski Cornell University Recap of Part 1 1. Learning to rank is widely used for information retrieval, and by web search engines. 2. There are many

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Information Retrieval

Information Retrieval Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University

More information

Do user preferences and evaluation measures line up?

Do user preferences and evaluation measures line up? Do user preferences and evaluation measures line up? Mark Sanderson, Monica Lestari Paramita, Paul Clough, Evangelos Kanoulas Department of Information Studies, University of Sheffield Regent Court, 211

More information

A Large Scale Validation and Analysis of Interleaved Search Evaluation

A Large Scale Validation and Analysis of Interleaved Search Evaluation A Large Scale Validation and Analysis of Interleaved Search Evaluation OLIVIER CHAPELLE, Yahoo! Research THORSTEN JOACHIMS, Cornell University FILIP RADLINSKI, Microsoft YISONG YUE, Carnegie Mellon University

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Performance Criteria Evaluation of Retrieval Systems. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs 3.

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from lectures by Prof. Ray Mooney (UT) and Prof. Razvan

More information

An Analysis of NP-Completeness in Novelty and Diversity Ranking

An Analysis of NP-Completeness in Novelty and Diversity Ranking An Analysis of NP-Completeness in Novelty and Diversity Ranking Ben Carterette (carteret@cis.udel.edu) Dept. of Computer and Info. Sciences, University of Delaware, Newark, DE, USA Abstract. A useful ability

More information

Automatic people tagging for expertise profiling in the enterprise

Automatic people tagging for expertise profiling in the enterprise Automatic people tagging for expertise profiling in the enterprise Pavel Serdyukov * (Yandex, Moscow, Russia) Mike Taylor, Vishwa Vinay, Matthew Richardson, Ryen White (Microsoft Research, Cambridge /

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Performance Criteria Evaluation of Retrieval Systems. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs 3.

More information

Evaluating Search Engines by Clickthrough Data

Evaluating Search Engines by Clickthrough Data Evaluating Search Engines by Clickthrough Data Jing He and Xiaoming Li Computer Network and Distributed System Laboratory, Peking University, China {hejing,lxm}@pku.edu.cn Abstract. It is no doubt that

More information

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research KDD 10 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research Agenda Focus: Recommender problems for dynamic, time-sensitive applications Content Optimization

More information

Relevance and Effort: An Analysis of Document Utility

Relevance and Effort: An Analysis of Document Utility Relevance and Effort: An Analysis of Document Utility Emine Yilmaz, Manisha Verma Dept. of Computer Science University College London {e.yilmaz, m.verma}@cs.ucl.ac.uk Nick Craswell, Filip Radlinski, Peter

More information

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Addressing the Challenges of Underspecification in Web Search. Michael Welch Addressing the Challenges of Underspecification in Web Search Michael Welch mjwelch@cs.ucla.edu Why study Web search?!! Search engines have enormous reach!! Nearly 1 billion queries globally each day!!

More information

A Dynamic Bayesian Network Click Model for Web Search Ranking

A Dynamic Bayesian Network Click Model for Web Search Ranking A Dynamic Bayesian Network Click Model for Web Search Ranking Olivier Chapelle and Anne Ya Zhang Apr 22, 2009 18th International World Wide Web Conference Introduction Motivation Clicks provide valuable

More information

Search Costs vs. User Satisfaction on Mobile

Search Costs vs. User Satisfaction on Mobile Search Costs vs. User Satisfaction on Mobile Manisha Verma, Emine Yilmaz University College London mverma@cs.ucl.ac.uk, emine.yilmaz@ucl.ac.uk Abstract. Information seeking is an interactive process where

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

Performance Measures for Multi-Graded Relevance

Performance Measures for Multi-Graded Relevance Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Carterette and Praveen Chandar {carteret,pcr}@udel.edu Department of Computer and Information Sciences University of Delaware

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson Microsoft Research Search = Modeling User Behavior

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Efficient Diversification of Web Search Results

Efficient Diversification of Web Search Results Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI-CNR, Pisa, Italy Laboratory Web Search Results Diversification Query: Vinci, what is the user

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Predicting Query Performance on the Web

Predicting Query Performance on the Web Predicting Query Performance on the Web No Author Given Abstract. Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However,

More information

Learning Temporal-Dependent Ranking Models

Learning Temporal-Dependent Ranking Models Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,

More information

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions Preprint from: Järvelin, K. & Price, S. & Delcambre, L. & Nielsen, M. (2008). Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions. In: Ruthven, I. & al. (Eds.), Proc. of the 30th European

More information

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract)

Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract) Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Active Evaluation of Ranking Functions based on Graded Relevance (Extended Abstract) Christoph Sawade sawade@cs.uni-potsdam.de

More information

Using Historical Click Data to Increase Interleaving Sensitivity

Using Historical Click Data to Increase Interleaving Sensitivity Using Historical Click Data to Increase Interleaving Sensitivity Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis Yandex, Moscow, Russia School of Computing Science, University of Glasgow,

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

On Duplicate Results in a Search Session

On Duplicate Results in a Search Session On Duplicate Results in a Search Session Jiepu Jiang Daqing He Shuguang Han School of Information Sciences University of Pittsburgh jiepu.jiang@gmail.com dah44@pitt.edu shh69@pitt.edu ABSTRACT In this

More information

Search Quality Evaluation Tools and Techniques

Search Quality Evaluation Tools and Techniques Search Quality Evaluation Tools and Techniques Alessandro Benedetti, Software Engineer Andrea Gazzarini, Software Engineer 2 nd October 2018 Who we are Alessandro Benedetti Search Consultant R&D Software

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation Sanda Harabagiu Lecture 13: Evaluation Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 5: Evaluation Ruixuan Li http://idc.hust.edu.cn/~rxli/ Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15  ISSN DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE Geetanjali Mohite 1, Prof. Gauri Rao 2 1 Student, Department of Computer Engineering, B.V.D.U.C.O.E, Pune, Maharashtra, India 2 Associate Professor,

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Overview of the TREC 2010 Web Track

Overview of the TREC 2010 Web Track Overview of the TREC 2010 Web Track Charles L. A. Clarke University of Waterloo Nick Craswell Microsoft Ian Soboroff NIST Gordon V. Cormack University of Waterloo 1 Introduction The TREC Web Track explores

More information

Optimized Interleaving for Online Retrieval Evaluation

Optimized Interleaving for Online Retrieval Evaluation Optimized Interleaving for Online Retrieval Evaluation Filip Radlinski Microsoft Cambridge, UK filiprad@microsoft.com Nick Craswell Microsoft Bellevue, WA, USA nickcr@microsoft.com ABSTRACT Interleaving

More information

Making Web Search More User- Centric. State of Play and Way Ahead

Making Web Search More User- Centric. State of Play and Way Ahead Making Web Search More User- Centric State of Play and Way Ahead A Yandex Overview 1997 Yandex.ru was launched 5 Search engine in the world * (# of queries) Offices > Moscow > 6 Offices in Russia > 3 Offices

More information