DD2475 Information Retrieval Lecture 5: Evaluation, Relevance Feedback, Query Expansion. Evaluation

Size: px
Start display at page:

Download "DD2475 Information Retrieval Lecture 5: Evaluation, Relevance Feedback, Query Expansion. Evaluation"

Transcription

1 Sec ! DD2475 Information Retrieval Lecture 5: Evaluation, Relevance Feedback, Query Epansion Lectures 2-4: All Tools for a Complete Information Retrieval System Lecture 2 Lecture 3 Lectures 2,4 Hedvig Kjellström hedvig@kth.se Lecture 4 Lecture 3 Lecture 4 Today!Evaluation (Manning Chapter 8) -!Benchmarks -!Precision and recall -!Making results usable to a user!relevance feedback and query epansion (Manning Chapter 9) -!Improving results -!Globally: query epansion/reformulation -!Locally: relevance feedback Evaluation (Manning Chapter 8)

2 Sec. 8.1! Sec. 8.3! Quality Measure - Relevance Unranked Retrieval Evaluation Metric: Precision and Recall!Most common proy for IR system: relevance of search results!relevance measurement requires: -!Relevance metric -!Benchmark document collection -!Benchmark suite of queries -! Binary label Relevant/Nonrelevant for each querydocument combination!precision: fraction of retrieved documents that also are relevant = P(relevant retrieved)!recall: fraction of relevant documents that also are retrieved = P(retrieved relevant) -!Precision -!Recall Relevant Retrieved tp fp Not Retrieved fn tn P = tp/(tp + fp) R = tp/(tp + fn) Nonrelevant!What is the relative size of the sets tp, tn, fp, fn? Sec. 8.3! Sec. 8.3! Another Metric: Accuracy Why Not Just Use Accuracy?!View engine as a classifier!accuracy: the fraction of classifications that are correct -!(tp + tn) / ( tp + fp + fn + tn)!how to build a % accurate search engine on a low budget. More in Lecture 9!Accuracy commonly used evaluation measure in machine learning classification!why is this not a very useful measure in IR? Search for:! 0 matching results found.!people doing information retrieval want to find something and have a certain tolerance for junk.

3 Sec. 8.3! Sec. 8.3! Precision/Recall Difficulties with Precision/Recall Evaluation!Ideal: -!No false positives, precision = 1 -!No false negatives, recall = 1!Reality: -!Get one or the other by tweaking parameters!recall non-decreasing function of #documents retrieved (1 when all documents retrieved)!precision decreases as recall increases -!Not a theorem, but result with strong empirical confirmation!should average over large document collection/query ensembles!need human relevance assessments -!People aren t reliable assessors!assessments have to be binary -!Nuanced assessments?!heavily skewed by collection/authorship -!Results may not translate from one domain to another Sec. 8.3! Sec. 8.3! Combined Measure: F F 1 (Purple) and Other Averages!Combined measure that assesses precision/recall tradeoff is F measure (weighted harmonic mean):!people usually use balanced F 1 measure -! i.e., with! = 1 or " =! F 1 = 2PR P + R

4 Sec. 8.4! Sec ! Ranked Retrieval Evaluation Metric: Precision-Recall Curve Eercise 5 Minutes # !Varying cut-off point (top K results)!sequence that generated this curve: Relevant tp (cumulative) fp (cumulative) fn (cumulative) Precision Recall -!1000 data points in total -!Precision P = tp/(tp + fp) -!Recall R = tp/(tp + fn)! of course, averaging over bunch of queries!!work out in pairs: -!Fill in the empty boes above DD2475 Lecture 4, November 16, 2010 Sec. 8.4! Sec. 8.4! Evaluation Measures Evaluation Measures!Average performance over many queries!summary measure: 11-point interpolated average precision -!Early standard measure -!Evaluates performance at all recall levels!summmary measure: Precision-at-K -!Precision of top k results -!Natural for web search -!Averages badly, arbitrary parameter K

5 Sec. 8.1! Sec. 8.5! What is Assessed? From Document Collections to Test Collections!User has information need!information need translated into query (by user)!relevance assessed relative to information need not query!eample: -! Information need: I'm looking for information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine. -!Query: WINE RED WHITE HEART ATTACK EFFECTIVE!Test queries -!Must be relevant to documents available -!Best designed by domain eperts -!Random query terms generally not a good idea!relevance assessments -!Human judges, time-consuming -!Human judges, error? Kappa Measure for Inter-Judge (Dis)Agreement Sec. 8.5! Kappa Measure: Eample #documents Judge 1 Judge Relevant Relevant 70 Nonrelevant Nonrelevant 20 Relevant Nonrelevant 10 Nonrelevant Relevant!Kappa measure -!Agreement measure among judges -!Designed for categorical judgments -!Corrects for chance agreement!kappa = [ P(A) P(E) ] / [ 1 P(E) ]!P(A) proportion of time judges agree!p(e) probability that agreement is by chance!kappa = 0 for chance agreement, 1 for total agreement!p(a) = 370/400 = 0.925!P(nonrelevant) = ( )/800 = !P(relevant) = ( )/800 = !P(E) = ^ ^2 = 0.665!Kappa = ( )/( ) = 0.776!Kappa > 0.8 = good agreement!0.67 < Kappa < 0.8 = reasonable!depends on purpose of study!for >2 judges: average pairwise kappas

6 Sec. 8.2! Sec ! Standard Relevance Benchmarks Can we avoid human judgment?!tet REtrieval Conference (TREC) -!IR test bed by National Institute of Standards and Technology (NIST) -!Reuters and other benchmark document collections used -! Retrieval tasks specified, sometimes as queries -! Each query-document combination marked as Relevant or Nonrelevant by human eperts!cross Language Evaluation Forum (CLEF) -!Concentrated on European languages and cross-language information retrieval!no!human judgement makes eperimental work hard -!Especially on a large scale!in some very specific settings, can use proies -!E.g. use eact cosine score to compare approimative vector space retrieval to eact (baseline)!can reuse (human labeled) test collections -!Watch for overtraining, overfitting!many others Sec ! Sec. 8.6! Evaluation of Large Search Engines Measures for a Search Engine!Search engines have test collections of queries and hand-ranked results!recall is difficult to measure on the web!often precision at top K, e.g., K = 10!Non-relevance-based measures -!Clickthrough on first result -!Studies of user behavior in the lab!how fast does it inde -!Number of documents/hour!how fast does it search -!Latency as a function of inde size!epressiveness of query language -!Ability to epress comple information needs -!Speed on comple queries!uncluttered UI!Is it free?

7 Sec. 8.6! Sec ! Measures for a Search Engine Measuring User Happiness!All of the preceding criteria are measurable.!the key measure: user happiness -!What is this? -!Speed of response/size of inde are factors -!But relevance is also important!need a way of quantifying user happiness!issue: who is the user we are trying to make happy? -!Depends on the setting!web engine: -!User finds what they want and return to the engine!ecommerce site: -!User finds what they want and buy!enterprise (company/govt/academic): -!Care about user productivity Sec. 8.7! Sec. 8.7! Result Summaries Result Summaries!Having ranked the documents matching a query, we wish to present a results list!most commonly, a list of the document titles plus a short summary, aka 10 blue links!title from document metadata. What about summaries? -!This description is crucial -!User can identify good/relevant hits based on description!two basic kinds: -!Static: Always the same, regardless of the query -!Dynamic: Query-dependent attempt to eplain why the document was retrieved for the query at hand

8 Sec. 8.7! Sec. 8.7! Static Summaries Dynamic Summaries!Subset of document!simplest heuristic: the first 50 words of document -!Summary cached at indeing time!more sophisticated: etract from each document a set of key sentences -!Simple NLP heuristics to score each sentence -!Summary is made up of top-scoring sentences Recall Lecture 4!Find small windows in document that contain query terms -!Requires fast window lookup in a document cache!score each window wrt query -!Use various features such as window width, position in document, etc.!most sophisticated: NLP used to synthesize a summary -!Seldom used in IR; cf. tet summarization work Alternative Results Presentations!An active area of HCI research!recent versions of Mac OS X: Cover Flow Relevance Feedback and Query Epansion (Manning Chapter 9)

9 Sec. 9.1! Improving Results Relevance Feedback!Local methods (documents relating to query) -!Relevance feedback!global methods (all documents in collection) -!Query epansion!relevance feedback: user feedback on relevance of documents in initial set of results -!User issues a (short, simple) query -!User marks some results as relevant or non-relevant -!System computes a better representation of the information need based on feedback -!Possibly iterate!key idea: may be difficult to formulate good query when you don t know the collection well, so iterate Sec. 9.1! Relevance Feedback Eample 1: Similar Pages!Will use ad hoc retrieval to refer to regular retrieval without relevance feedback!three eamples of relevance feedback that highlight different aspects

10 Eample 2: Relevance Feedback in Image Search Results for Initial Query!Eperimental image search engine (Newsam et al., ICIP 2001) Not active anymore Query-document similarity according to different measures Relevance Feedback Results after Relevance Feedback

11 Eample 3: Relevance Feedback in Tet Search Query-document similarity Results for Initial Query!Tet search eample!initial query: NEW SPACE SATELLITE APPLICATIONS , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes , 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget , 07/24/90, Scientist Who Eposed Global Warming Proposes Satellites for Climate Research , 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate , 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada , 12/02/87, Telecommunications Tale of Two Companies Relevance Feedback Epanded Query after Relevance Feedback +! +! +! , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes , 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget , 07/24/90, Scientist Who Eposed Global Warming Proposes Satellites for Climate Research , 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate , 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada , 12/02/87, Telecommunications Tale of Two Companies new space satellite application nasa eos launch aster instrument arianespace bundespost ss rocket scientist broadcast earth oil measure

12 Updated query-document similarity Results for Epanded Query Key Concept: Centroid Old 2! Old 1! Old 8! , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 08/07/89, When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own , 07/31/89, NASA Uses Warm Superconductors For Fast Circuit , 12/02/87, Telecommunications Tale of Two Companies , 07/09/91, Soviets May Adapt Parts of SS-20 Missile For Commercial Use , 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers , 06/14/90, Rescue of Satellite By Space Agency To Cost $90 Million!Centroid = center of mass of set of points!recall: Represent documents as points in highdimensional space!definition: Centroid where C is a set of documents -!Vector mean Rocchio Algorithm The Theoretically Best Query!Uses the vector space model to pick query with optimal terms and weights!seeks the query q opt that maimizes!tries to separate documents marked relevant and non-relevant #! o! o! o! "(o)! o! o! o! "()!!Problem: we don t know the truly relevant documents Optimal query non-relevant documents o relevant documents

13 Rocchio 1971 Algorithm (SMART) Relevance Feedback on Initial Query!Used in practice:!d r = set of known relevant document vectors!d nr = set of known irrelevant document vectors -!Different from C r and C nr!!!q m = modified query vector; q 0 = original query vector;!,",#: weights (hand-chosen or set empirically) Initial query o! #! o! o! #! "(o)! "()! o! o! o!!new query moves toward relevant documents and away from irrelevant documents Revised query known non-relevant documents o known relevant documents Sec ! Sec ! Eercise 5 Minutes Relevance Feedback: Assumptions!A1: User has sufficient knowledge for initial query -!Violations: Misspellings, cross-language retrieval, vocabulary mismatch!positive feedback is more valuable than negative feedback (so, set $ <!; e.g. $ = 0.25,! = 0.75)!Many systems (both eamples earlier) allow only positive relevance feedback ($ = 0)!A2: Relevance prototypes are well-behaved -!Term distribution in relevant documents similar -!Term distribution in non-relevant documents different from relevant documents -!Violations: Synonyms, two subjects in one query!discuss in pairs/groups: -!Why? DD2475 Lecture 4, November 16, 2010

14 Sec ! Relevance Feedback: Problems Query Epansion!Long queries inefficient for typical IR engine -!Long response times for user -!High cost for retrieval system!why are long queries ineffiecient?!users are often reluctant to provide eplicit feedback!often harder to understand why a particular document was retrieved after applying relevance feedback!in relevance feedback, users give additional input (relevant/non-relevant) on documents -!Local, dynamic, analysis of documents relating to query!in query epansion, users give additional input (refine search term) on words or phrases -!Global, static, analysis of all documents in collection Sec ! Query Assist How is Query Augmented?!Manual thesaurus -!Maintained by human specialists -!Used e.g. in Medicine!Automatic thesaurus -!More to come!refinements based on query log mining -!Common on the web -!System suggests queries used before by others -!Compare to Collaborative Filtering:

15 Sec ! Sec ! Thesaurus-Based Query Epansion Automatic Thesaurus Generation!For each query term, epand query from thesaurus -!feline! feline cat!attempt to generate thesaurus automatically by analyzing collection of documents!may weight added terms less than original terms!generally increases recall!may significantly decrease precision, particularly with ambiguous terms -! interest rate % interest rate fascinate evaluate!widely used in many science/engineering fields!there is a high cost of manually producing and maintaning a thesaurus!definition 1: Two words similar if they co-occur with similar words!definition 2: Two words are similar if they occur in grammatical relation with same words -! You can harvest, peel, eat, prepare, etc. apples and pears! apples and pears must be similar!1 more robust, 2 more accurate!why is that? Sec ! Automatic Thesaurus Generation: Eample Automatic Thesaurus Generation: Discussion!Quality of associations usually a problem!term ambiguity may introduce irrelevant statistically correlated terms. -! Apple computer % Apple red fruit computer!problems: -!False positives: Words deemed similar that are not -!False negatives: Words deemed dissimilar that are similar!since terms are highly correlated anyway, epansion may not retrieve many additional documents.

16 Net!Computer hall session (November 25, ) -!Grå (Osquars Backe 2, level 5) -!Eamination of computer assignment 1 -!Book a time in the Doodle (see course homepage)!lecture 6 (November 30, ) -!D32 -!Readings: Manning Chapter 10

Sec. 8.7 RESULTS PRESENTATION

Sec. 8.7 RESULTS PRESENTATION Sec. 8.7 RESULTS PRESENTATION 1 Sec. 8.7 Result Summaries Having ranked the documents matching a query, we wish to present a results list Most commonly, a list of the document titles plus a short summary,

More information

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.

More information

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user EVALUATION Sec. 6.2 THIS LECTURE How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results usable to a user 2 3 EVALUATING

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 09 Relevance Feedback & Query Epansion 1 Recap of the last lecture Evaluating a search engine Benchmarks Precision and recall Results summaries

More information

Relevance Feedback and Query Expansion. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Relevance Feedback and Query Expansion. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Relevance Feedback and Query Epansion Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Importance of Recall Academic importance Not only of academic importance

More information

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.

More information

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation Sanda Harabagiu Lecture 13: Evaluation Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results

More information

EPL660: INFORMATION RETRIEVAL AND SEARCH ENGINES. Slides by Manning, Raghavan, Schutze

EPL660: INFORMATION RETRIEVAL AND SEARCH ENGINES. Slides by Manning, Raghavan, Schutze EPL660: INFORMATION RETRIEVAL AND SEARCH ENGINES 1 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good

More information

Part 7: Evaluation of IR Systems Francesco Ricci

Part 7: Evaluation of IR Systems Francesco Ricci Part 7: Evaluation of IR Systems Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 This lecture Sec. 6.2 p How

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 5: Evaluation Ruixuan Li http://idc.hust.edu.cn/~rxli/ Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks

More information

Query expansion. A query contains part of the important words Add new (related) terms into the query

Query expansion. A query contains part of the important words Add new (related) terms into the query 3 Query epansion A query contains part of the important words Add new (related) terms into the query Manually constructed knowledge base/thesaurus (e.g. Wordnet) Q = information retrieval Q = (information

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Web Information Retrieval. Exercises Evaluation in information retrieval

Web Information Retrieval. Exercises Evaluation in information retrieval Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processing and Information Retrieval Performance Evaluation Query Epansion Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Information Retrieval. Lecture 7

Information Retrieval. Lecture 7 Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision

More information

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Evaluation. David Kauchak cs160 Fall 2009 adapted from: Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Lecture 8: Evalua*on 1 Sec. 6.2 This lecture How do we know if our results are any good? Evalua*ng a search engine Benchmarks Precision and recall 2 EVALUATING SEARCH

More information

Information Retrieval

Information Retrieval Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on Sec. 6.2 This lecture How do we know if our results are any good? Evalua)ng

More information

This lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine

This lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine Sec. 6.2 Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on This lecture How do we know if our results are any good? Evalua)ng

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval 10 Lecture 10: Relevance Feedback and Query Expansion, and XML IR 1 Last Time Search engine evaluation Benchmark Measures: Precision /

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 8: Evaluation & Result Summaries Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-05-07

More information

Overview. Lecture 6: Evaluation. Summary: Ranked retrieval. Overview. Information Retrieval Computer Science Tripos Part II.

Overview. Lecture 6: Evaluation. Summary: Ranked retrieval. Overview. Information Retrieval Computer Science Tripos Part II. Overview Lecture 6: Evaluation Information Retrieval Computer Science Tripos Part II Recap/Catchup 2 Introduction Ronan Cummins 3 Unranked evaluation Natural Language and Information Processing (NLIP)

More information

Query Operations. Relevance Feedback Query Expansion Query interpretation

Query Operations. Relevance Feedback Query Expansion Query interpretation Query Operations Relevance Feedback Query Expansion Query interpretation 1 Relevance Feedback After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or

More information

Information Retrieval. Techniques for Relevance Feedback

Information Retrieval. Techniques for Relevance Feedback Information Retrieval Techniques for Relevance Feedback Introduction An information need may be epressed using different keywords (synonymy) impact on recall eamples: ship vs boat, aircraft vs airplane

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

2018 EE448, Big Data Mining, Lecture 8. Search Engines. Weinan Zhang Shanghai Jiao Tong University

2018 EE448, Big Data Mining, Lecture 8. Search Engines. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 8 Search Engines Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Acknowledgement and References Referred

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

The IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics.

The IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics. The IR Black Bo LBSC 796/INFM 718R: Week 8 Relevance Feedback Query Search Ranked List Jimmy Lin College of Information Studies University of Maryland Monday, March 27, 2006 Anomalous State of Knowledge

More information

Lecture 5: Evaluation

Lecture 5: Evaluation Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent 2014 204 Overview 1 Recap/Catchup

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Evalua*on & Result Summaries 1 Evalua*on issues Presen*ng the results to users Efficiently extrac*ng the k- best Evalua*ng the quality of results qualita*ve evalua*on

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Evalua*on & Result Summaries 1 Evalua*on issues Presen*ng the results to users Efficiently extrac*ng the k- best Evalua*ng the quality of results qualita*ve evalua*on

More information

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India rroy@adobe.com 2014 Adobe Systems Incorporated. All Rights Reserved. 1 Introduction

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning, Pandu Nayak and Prabhakar Raghavan Evaluation 1 Situation Thanks to your stellar performance in CS276, you

More information

Overview of Information Retrieval and Organization. CSC 575 Intelligent Information Retrieval

Overview of Information Retrieval and Organization. CSC 575 Intelligent Information Retrieval Overview of Information Retrieval and Organization CSC 575 Intelligent Information Retrieval 2 How much information? Google: ~100 PB a day; 1+ million servers (est. 15-20 Exabytes stored) Wayback Machine

More information

DD2475 Information Retrieval Lecture 7: Probabilistic Information Retrieval, Language Models

DD2475 Information Retrieval Lecture 7: Probabilistic Information Retrieval, Language Models Only Simplified Model of Reality DD2475 Information Retrieval Lecture 7: Probabilistic Information Retrieval, Language Models Hedvig Kjellström hedvig@kth.se www.csc.kth.se/dd2475 User Information Need

More information

Multimedia Information Retrieval

Multimedia Information Retrieval Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia Information Retrieval 1. What are

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from lectures by Prof. Ray Mooney (UT) and Prof. Razvan

More information

Information Retrieval - Query expansion. Jian-Yun Nie

Information Retrieval - Query expansion. Jian-Yun Nie Information Retrieval - Query epansion Jian-Yun Nie 1 Recap of previous lectures Indeing Traditional IR models Lemur toolkit How to run it How to modify it (for query epansion) Evaluating a search engine

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Modeling User Behavior and Interactions. Lecture 2: Interpreting Behavior Data. Eugene Agichtein

Modeling User Behavior and Interactions. Lecture 2: Interpreting Behavior Data. Eugene Agichtein Modeling User Behavior and Interactions ti Lecture 2: Interpreting Behavior Data Eugene Agichtein Emory University Lecture 2 Plan Explicit Feedback in IR Query expansion User control Click From Clicks

More information

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents. Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this

More information

Query Suggestion. A variety of automatic or semi-automatic query suggestion techniques have been developed

Query Suggestion. A variety of automatic or semi-automatic query suggestion techniques have been developed Query Suggestion Query Suggestion A variety of automatic or semi-automatic query suggestion techniques have been developed Goal is to improve effectiveness by matching related/similar terms Semi-automatic

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Chapter 6 Evaluation Metrics and Evaluation

Chapter 6 Evaluation Metrics and Evaluation Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback Refine initially: query Refining searches Commonly, query epansion add synonyms Improve recall Hurt precision? Sometimes done automatically Modify based on pri searches Not automatic All pri searches vs

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: A

More information

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Chapter III.2: Basic ranking & evaluation measures

Chapter III.2: Basic ranking & evaluation measures Chapter III.2: Basic ranking & evaluation measures 1. TF-IDF and vector space model 1.1. Term frequency counting with TF-IDF 1.2. Documents and queries as vectors 2. Evaluating IR results 2.1. Evaluation

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

Query Refinement and Search Result Presentation

Query Refinement and Search Result Presentation Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the

More information

Boolean Queries. Keywords combined with Boolean operators:

Boolean Queries. Keywords combined with Boolean operators: Query Languages 1 Boolean Queries Keywords combined with Boolean operators: OR: (e 1 OR e 2 ) AND: (e 1 AND e 2 ) BUT: (e 1 BUT e 2 ) Satisfy e 1 but not e 2 Negation only allowed using BUT to allow efficient

More information

Course structure & admin. CS276A Text Information Retrieval, Mining, and Exploitation. Dictionary and postings files: a fast, compact inverted index

Course structure & admin. CS276A Text Information Retrieval, Mining, and Exploitation. Dictionary and postings files: a fast, compact inverted index CS76A Text Information Retrieval, Mining, and Exploitation Lecture 1 Oct 00 Course structure & admin CS76: two quarters this year: CS76A: IR, web (link alg.), (infovis, XML, PP) Website: http://cs76a.stanford.edu/

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building

More information

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Evaluation of Retrieval Systems 1 Performance Criteria 1. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive

More information

CS347. Lecture 2 April 9, Prabhakar Raghavan

CS347. Lecture 2 April 9, Prabhakar Raghavan CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information

More information

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:

More information

The Security Role for Content Analysis

The Security Role for Content Analysis The Security Role for Content Analysis Jim Nisbet Founder, Tablus, Inc. November 17, 2004 About Us Tablus is a 3 year old company that delivers solutions to provide visibility to sensitive information

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

On the automatic classification of app reviews

On the automatic classification of app reviews Requirements Eng (2016) 21:311 331 DOI 10.1007/s00766-016-0251-9 RE 2015 On the automatic classification of app reviews Walid Maalej 1 Zijad Kurtanović 1 Hadeer Nabil 2 Christoph Stanik 1 Walid: please

More information

An Introduction to Maple This lab is adapted from a lab created by Bob Milnikel.

An Introduction to Maple This lab is adapted from a lab created by Bob Milnikel. Some quick tips for getting started with Maple: An Introduction to Maple This lab is adapted from a lab created by Bob Milnikel. [Even before we start, take note of the distinction between Tet mode and

More information

Search Engines and Learning to Rank

Search Engines and Learning to Rank Search Engines and Learning to Rank Joseph (Yossi) Keshet Query processor Ranker Cache Forward index Inverted index Link analyzer Indexer Parser Web graph Crawler Representations TF-IDF To get an effective

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap

More information

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today Sec.14.1! Recap: Classification DD2475 Information Retrieval Lecture 10: Clustering Hedvig Kjellström hedvig@kth.se www.csc.kth.se/dd2475 Data points have labels Classification task: Finding good separators

More information

Evaluating Machine-Learning Methods. Goals for the lecture

Evaluating Machine-Learning Methods. Goals for the lecture Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1 COMPUTER ENGINEERING DEPARTMENT BILKENT UNIVERSITY Assignment No. 1 Abdurrahman Yasar June 10, 2014 1 QUESTION 1 Consider the following search results for two queries Q1 and Q2 (the documents are ranked

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Information Retrieval

Information Retrieval Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid

More information

Information Retrieval. Lecture 3: Evaluation methodology

Information Retrieval. Lecture 3: Evaluation methodology Information Retrieval Lecture 3: Evaluation methodology Computer Science Tripos Part II Lent Term 2004 Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Chapter 9. Classification and Clustering

Chapter 9. Classification and Clustering Chapter 9 Classification and Clustering Classification and Clustering Classification and clustering are classical pattern recognition and machine learning problems Classification, also referred to as categorization

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information