DD2475 Information Retrieval Lecture 5: Evaluation, Relevance Feedback, Query Expansion. Evaluation
|
|
- Garry Allen
- 6 years ago
- Views:
Transcription
1 Sec ! DD2475 Information Retrieval Lecture 5: Evaluation, Relevance Feedback, Query Epansion Lectures 2-4: All Tools for a Complete Information Retrieval System Lecture 2 Lecture 3 Lectures 2,4 Hedvig Kjellström hedvig@kth.se Lecture 4 Lecture 3 Lecture 4 Today!Evaluation (Manning Chapter 8) -!Benchmarks -!Precision and recall -!Making results usable to a user!relevance feedback and query epansion (Manning Chapter 9) -!Improving results -!Globally: query epansion/reformulation -!Locally: relevance feedback Evaluation (Manning Chapter 8)
2 Sec. 8.1! Sec. 8.3! Quality Measure - Relevance Unranked Retrieval Evaluation Metric: Precision and Recall!Most common proy for IR system: relevance of search results!relevance measurement requires: -!Relevance metric -!Benchmark document collection -!Benchmark suite of queries -! Binary label Relevant/Nonrelevant for each querydocument combination!precision: fraction of retrieved documents that also are relevant = P(relevant retrieved)!recall: fraction of relevant documents that also are retrieved = P(retrieved relevant) -!Precision -!Recall Relevant Retrieved tp fp Not Retrieved fn tn P = tp/(tp + fp) R = tp/(tp + fn) Nonrelevant!What is the relative size of the sets tp, tn, fp, fn? Sec. 8.3! Sec. 8.3! Another Metric: Accuracy Why Not Just Use Accuracy?!View engine as a classifier!accuracy: the fraction of classifications that are correct -!(tp + tn) / ( tp + fp + fn + tn)!how to build a % accurate search engine on a low budget. More in Lecture 9!Accuracy commonly used evaluation measure in machine learning classification!why is this not a very useful measure in IR? Search for:! 0 matching results found.!people doing information retrieval want to find something and have a certain tolerance for junk.
3 Sec. 8.3! Sec. 8.3! Precision/Recall Difficulties with Precision/Recall Evaluation!Ideal: -!No false positives, precision = 1 -!No false negatives, recall = 1!Reality: -!Get one or the other by tweaking parameters!recall non-decreasing function of #documents retrieved (1 when all documents retrieved)!precision decreases as recall increases -!Not a theorem, but result with strong empirical confirmation!should average over large document collection/query ensembles!need human relevance assessments -!People aren t reliable assessors!assessments have to be binary -!Nuanced assessments?!heavily skewed by collection/authorship -!Results may not translate from one domain to another Sec. 8.3! Sec. 8.3! Combined Measure: F F 1 (Purple) and Other Averages!Combined measure that assesses precision/recall tradeoff is F measure (weighted harmonic mean):!people usually use balanced F 1 measure -! i.e., with! = 1 or " =! F 1 = 2PR P + R
4 Sec. 8.4! Sec ! Ranked Retrieval Evaluation Metric: Precision-Recall Curve Eercise 5 Minutes # !Varying cut-off point (top K results)!sequence that generated this curve: Relevant tp (cumulative) fp (cumulative) fn (cumulative) Precision Recall -!1000 data points in total -!Precision P = tp/(tp + fp) -!Recall R = tp/(tp + fn)! of course, averaging over bunch of queries!!work out in pairs: -!Fill in the empty boes above DD2475 Lecture 4, November 16, 2010 Sec. 8.4! Sec. 8.4! Evaluation Measures Evaluation Measures!Average performance over many queries!summary measure: 11-point interpolated average precision -!Early standard measure -!Evaluates performance at all recall levels!summmary measure: Precision-at-K -!Precision of top k results -!Natural for web search -!Averages badly, arbitrary parameter K
5 Sec. 8.1! Sec. 8.5! What is Assessed? From Document Collections to Test Collections!User has information need!information need translated into query (by user)!relevance assessed relative to information need not query!eample: -! Information need: I'm looking for information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine. -!Query: WINE RED WHITE HEART ATTACK EFFECTIVE!Test queries -!Must be relevant to documents available -!Best designed by domain eperts -!Random query terms generally not a good idea!relevance assessments -!Human judges, time-consuming -!Human judges, error? Kappa Measure for Inter-Judge (Dis)Agreement Sec. 8.5! Kappa Measure: Eample #documents Judge 1 Judge Relevant Relevant 70 Nonrelevant Nonrelevant 20 Relevant Nonrelevant 10 Nonrelevant Relevant!Kappa measure -!Agreement measure among judges -!Designed for categorical judgments -!Corrects for chance agreement!kappa = [ P(A) P(E) ] / [ 1 P(E) ]!P(A) proportion of time judges agree!p(e) probability that agreement is by chance!kappa = 0 for chance agreement, 1 for total agreement!p(a) = 370/400 = 0.925!P(nonrelevant) = ( )/800 = !P(relevant) = ( )/800 = !P(E) = ^ ^2 = 0.665!Kappa = ( )/( ) = 0.776!Kappa > 0.8 = good agreement!0.67 < Kappa < 0.8 = reasonable!depends on purpose of study!for >2 judges: average pairwise kappas
6 Sec. 8.2! Sec ! Standard Relevance Benchmarks Can we avoid human judgment?!tet REtrieval Conference (TREC) -!IR test bed by National Institute of Standards and Technology (NIST) -!Reuters and other benchmark document collections used -! Retrieval tasks specified, sometimes as queries -! Each query-document combination marked as Relevant or Nonrelevant by human eperts!cross Language Evaluation Forum (CLEF) -!Concentrated on European languages and cross-language information retrieval!no!human judgement makes eperimental work hard -!Especially on a large scale!in some very specific settings, can use proies -!E.g. use eact cosine score to compare approimative vector space retrieval to eact (baseline)!can reuse (human labeled) test collections -!Watch for overtraining, overfitting!many others Sec ! Sec. 8.6! Evaluation of Large Search Engines Measures for a Search Engine!Search engines have test collections of queries and hand-ranked results!recall is difficult to measure on the web!often precision at top K, e.g., K = 10!Non-relevance-based measures -!Clickthrough on first result -!Studies of user behavior in the lab!how fast does it inde -!Number of documents/hour!how fast does it search -!Latency as a function of inde size!epressiveness of query language -!Ability to epress comple information needs -!Speed on comple queries!uncluttered UI!Is it free?
7 Sec. 8.6! Sec ! Measures for a Search Engine Measuring User Happiness!All of the preceding criteria are measurable.!the key measure: user happiness -!What is this? -!Speed of response/size of inde are factors -!But relevance is also important!need a way of quantifying user happiness!issue: who is the user we are trying to make happy? -!Depends on the setting!web engine: -!User finds what they want and return to the engine!ecommerce site: -!User finds what they want and buy!enterprise (company/govt/academic): -!Care about user productivity Sec. 8.7! Sec. 8.7! Result Summaries Result Summaries!Having ranked the documents matching a query, we wish to present a results list!most commonly, a list of the document titles plus a short summary, aka 10 blue links!title from document metadata. What about summaries? -!This description is crucial -!User can identify good/relevant hits based on description!two basic kinds: -!Static: Always the same, regardless of the query -!Dynamic: Query-dependent attempt to eplain why the document was retrieved for the query at hand
8 Sec. 8.7! Sec. 8.7! Static Summaries Dynamic Summaries!Subset of document!simplest heuristic: the first 50 words of document -!Summary cached at indeing time!more sophisticated: etract from each document a set of key sentences -!Simple NLP heuristics to score each sentence -!Summary is made up of top-scoring sentences Recall Lecture 4!Find small windows in document that contain query terms -!Requires fast window lookup in a document cache!score each window wrt query -!Use various features such as window width, position in document, etc.!most sophisticated: NLP used to synthesize a summary -!Seldom used in IR; cf. tet summarization work Alternative Results Presentations!An active area of HCI research!recent versions of Mac OS X: Cover Flow Relevance Feedback and Query Epansion (Manning Chapter 9)
9 Sec. 9.1! Improving Results Relevance Feedback!Local methods (documents relating to query) -!Relevance feedback!global methods (all documents in collection) -!Query epansion!relevance feedback: user feedback on relevance of documents in initial set of results -!User issues a (short, simple) query -!User marks some results as relevant or non-relevant -!System computes a better representation of the information need based on feedback -!Possibly iterate!key idea: may be difficult to formulate good query when you don t know the collection well, so iterate Sec. 9.1! Relevance Feedback Eample 1: Similar Pages!Will use ad hoc retrieval to refer to regular retrieval without relevance feedback!three eamples of relevance feedback that highlight different aspects
10 Eample 2: Relevance Feedback in Image Search Results for Initial Query!Eperimental image search engine (Newsam et al., ICIP 2001) Not active anymore Query-document similarity according to different measures Relevance Feedback Results after Relevance Feedback
11 Eample 3: Relevance Feedback in Tet Search Query-document similarity Results for Initial Query!Tet search eample!initial query: NEW SPACE SATELLITE APPLICATIONS , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes , 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget , 07/24/90, Scientist Who Eposed Global Warming Proposes Satellites for Climate Research , 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate , 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada , 12/02/87, Telecommunications Tale of Two Companies Relevance Feedback Epanded Query after Relevance Feedback +! +! +! , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes , 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget , 07/24/90, Scientist Who Eposed Global Warming Proposes Satellites for Climate Research , 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate , 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada , 12/02/87, Telecommunications Tale of Two Companies new space satellite application nasa eos launch aster instrument arianespace bundespost ss rocket scientist broadcast earth oil measure
12 Updated query-document similarity Results for Epanded Query Key Concept: Centroid Old 2! Old 1! Old 8! , 07/09/91, NASA Scratches Environment Gear From Satellite Plan , 08/13/91, NASA Hasn t Scrapped Imaging Spectrometer , 08/07/89, When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own , 07/31/89, NASA Uses Warm Superconductors For Fast Circuit , 12/02/87, Telecommunications Tale of Two Companies , 07/09/91, Soviets May Adapt Parts of SS-20 Missile For Commercial Use , 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers , 06/14/90, Rescue of Satellite By Space Agency To Cost $90 Million!Centroid = center of mass of set of points!recall: Represent documents as points in highdimensional space!definition: Centroid where C is a set of documents -!Vector mean Rocchio Algorithm The Theoretically Best Query!Uses the vector space model to pick query with optimal terms and weights!seeks the query q opt that maimizes!tries to separate documents marked relevant and non-relevant #! o! o! o! "(o)! o! o! o! "()!!Problem: we don t know the truly relevant documents Optimal query non-relevant documents o relevant documents
13 Rocchio 1971 Algorithm (SMART) Relevance Feedback on Initial Query!Used in practice:!d r = set of known relevant document vectors!d nr = set of known irrelevant document vectors -!Different from C r and C nr!!!q m = modified query vector; q 0 = original query vector;!,",#: weights (hand-chosen or set empirically) Initial query o! #! o! o! #! "(o)! "()! o! o! o!!new query moves toward relevant documents and away from irrelevant documents Revised query known non-relevant documents o known relevant documents Sec ! Sec ! Eercise 5 Minutes Relevance Feedback: Assumptions!A1: User has sufficient knowledge for initial query -!Violations: Misspellings, cross-language retrieval, vocabulary mismatch!positive feedback is more valuable than negative feedback (so, set $ <!; e.g. $ = 0.25,! = 0.75)!Many systems (both eamples earlier) allow only positive relevance feedback ($ = 0)!A2: Relevance prototypes are well-behaved -!Term distribution in relevant documents similar -!Term distribution in non-relevant documents different from relevant documents -!Violations: Synonyms, two subjects in one query!discuss in pairs/groups: -!Why? DD2475 Lecture 4, November 16, 2010
14 Sec ! Relevance Feedback: Problems Query Epansion!Long queries inefficient for typical IR engine -!Long response times for user -!High cost for retrieval system!why are long queries ineffiecient?!users are often reluctant to provide eplicit feedback!often harder to understand why a particular document was retrieved after applying relevance feedback!in relevance feedback, users give additional input (relevant/non-relevant) on documents -!Local, dynamic, analysis of documents relating to query!in query epansion, users give additional input (refine search term) on words or phrases -!Global, static, analysis of all documents in collection Sec ! Query Assist How is Query Augmented?!Manual thesaurus -!Maintained by human specialists -!Used e.g. in Medicine!Automatic thesaurus -!More to come!refinements based on query log mining -!Common on the web -!System suggests queries used before by others -!Compare to Collaborative Filtering:
15 Sec ! Sec ! Thesaurus-Based Query Epansion Automatic Thesaurus Generation!For each query term, epand query from thesaurus -!feline! feline cat!attempt to generate thesaurus automatically by analyzing collection of documents!may weight added terms less than original terms!generally increases recall!may significantly decrease precision, particularly with ambiguous terms -! interest rate % interest rate fascinate evaluate!widely used in many science/engineering fields!there is a high cost of manually producing and maintaning a thesaurus!definition 1: Two words similar if they co-occur with similar words!definition 2: Two words are similar if they occur in grammatical relation with same words -! You can harvest, peel, eat, prepare, etc. apples and pears! apples and pears must be similar!1 more robust, 2 more accurate!why is that? Sec ! Automatic Thesaurus Generation: Eample Automatic Thesaurus Generation: Discussion!Quality of associations usually a problem!term ambiguity may introduce irrelevant statistically correlated terms. -! Apple computer % Apple red fruit computer!problems: -!False positives: Words deemed similar that are not -!False negatives: Words deemed dissimilar that are similar!since terms are highly correlated anyway, epansion may not retrieve many additional documents.
16 Net!Computer hall session (November 25, ) -!Grå (Osquars Backe 2, level 5) -!Eamination of computer assignment 1 -!Book a time in the Doodle (see course homepage)!lecture 6 (November 30, ) -!D32 -!Readings: Manning Chapter 10
Sec. 8.7 RESULTS PRESENTATION
Sec. 8.7 RESULTS PRESENTATION 1 Sec. 8.7 Result Summaries Having ranked the documents matching a query, we wish to present a results list Most commonly, a list of the document titles plus a short summary,
More informationQuery reformulation CE-324: Modern Information Retrieval Sharif University of Technology
Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.
More informationTHIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user
EVALUATION Sec. 6.2 THIS LECTURE How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results usable to a user 2 3 EVALUATING
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 09 Relevance Feedback & Query Epansion 1 Recap of the last lecture Evaluating a search engine Benchmarks Precision and recall Results summaries
More informationRelevance Feedback and Query Expansion. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata
Relevance Feedback and Query Epansion Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Importance of Recall Academic importance Not only of academic importance
More informationQuery reformulation CE-324: Modern Information Retrieval Sharif University of Technology
Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.
More informationCS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation
Sanda Harabagiu Lecture 13: Evaluation Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results
More informationEPL660: INFORMATION RETRIEVAL AND SEARCH ENGINES. Slides by Manning, Raghavan, Schutze
EPL660: INFORMATION RETRIEVAL AND SEARCH ENGINES 1 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good
More informationPart 7: Evaluation of IR Systems Francesco Ricci
Part 7: Evaluation of IR Systems Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 This lecture Sec. 6.2 p How
More informationSearch Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]
Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 5: Evaluation Ruixuan Li http://idc.hust.edu.cn/~rxli/ Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks
More informationQuery expansion. A query contains part of the important words Add new (related) terms into the query
3 Query epansion A query contains part of the important words Add new (related) terms into the query Manually constructed knowledge base/thesaurus (e.g. Wordnet) Q = information retrieval Q = (information
More informationEvaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology
Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationWeb Information Retrieval. Exercises Evaluation in information retrieval
Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need
More informationNatural Language Processing and Information Retrieval
Natural Language Processing and Information Retrieval Performance Evaluation Query Epansion Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it
More informationEvaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology
Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationInformation Retrieval. Lecture 7
Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision
More informationEvaluation. David Kauchak cs160 Fall 2009 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For
More informationCSCI 5417 Information Retrieval Systems. Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Lecture 8: Evalua*on 1 Sec. 6.2 This lecture How do we know if our results are any good? Evalua*ng a search engine Benchmarks Precision and recall 2 EVALUATING SEARCH
More informationInformation Retrieval
Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment
More informationInforma(on Retrieval
Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on Sec. 6.2 This lecture How do we know if our results are any good? Evalua)ng
More informationThis lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine
Sec. 6.2 Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on This lecture How do we know if our results are any good? Evalua)ng
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 Information Retrieval 10 Lecture 10: Relevance Feedback and Query Expansion, and XML IR 1 Last Time Search engine evaluation Benchmark Measures: Precision /
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 8: Evaluation & Result Summaries Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-05-07
More informationOverview. Lecture 6: Evaluation. Summary: Ranked retrieval. Overview. Information Retrieval Computer Science Tripos Part II.
Overview Lecture 6: Evaluation Information Retrieval Computer Science Tripos Part II Recap/Catchup 2 Introduction Ronan Cummins 3 Unranked evaluation Natural Language and Information Processing (NLIP)
More informationQuery Operations. Relevance Feedback Query Expansion Query interpretation
Query Operations Relevance Feedback Query Expansion Query interpretation 1 Relevance Feedback After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or
More informationInformation Retrieval. Techniques for Relevance Feedback
Information Retrieval Techniques for Relevance Feedback Introduction An information need may be epressed using different keywords (synonymy) impact on recall eamples: ship vs boat, aircraft vs airplane
More informationInformation Retrieval
Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework
More informationInformation Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007
Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework
More informationEvaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology
Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More information2018 EE448, Big Data Mining, Lecture 8. Search Engines. Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 8 Search Engines Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Acknowledgement and References Referred
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationThe IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics.
The IR Black Bo LBSC 796/INFM 718R: Week 8 Relevance Feedback Query Search Ranked List Jimmy Lin College of Information Studies University of Maryland Monday, March 27, 2006 Anomalous State of Knowledge
More informationLecture 5: Evaluation
Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent 2014 204 Overview 1 Recap/Catchup
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Evalua*on & Result Summaries 1 Evalua*on issues Presen*ng the results to users Efficiently extrac*ng the k- best Evalua*ng the quality of results qualita*ve evalua*on
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Evalua*on & Result Summaries 1 Evalua*on issues Presen*ng the results to users Efficiently extrac*ng the k- best Evalua*ng the quality of results qualita*ve evalua*on
More informationExperiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India
Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India rroy@adobe.com 2014 Adobe Systems Incorporated. All Rights Reserved. 1 Introduction
More informationInformation Retrieval
Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning, Pandu Nayak and Prabhakar Raghavan Evaluation 1 Situation Thanks to your stellar performance in CS276, you
More informationOverview of Information Retrieval and Organization. CSC 575 Intelligent Information Retrieval
Overview of Information Retrieval and Organization CSC 575 Intelligent Information Retrieval 2 How much information? Google: ~100 PB a day; 1+ million servers (est. 15-20 Exabytes stored) Wayback Machine
More informationDD2475 Information Retrieval Lecture 7: Probabilistic Information Retrieval, Language Models
Only Simplified Model of Reality DD2475 Information Retrieval Lecture 7: Probabilistic Information Retrieval, Language Models Hedvig Kjellström hedvig@kth.se www.csc.kth.se/dd2475 User Information Need
More informationMultimedia Information Retrieval
Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia Information Retrieval 1. What are
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from lectures by Prof. Ray Mooney (UT) and Prof. Razvan
More informationInformation Retrieval - Query expansion. Jian-Yun Nie
Information Retrieval - Query epansion Jian-Yun Nie 1 Recap of previous lectures Indeing Traditional IR models Lemur toolkit How to run it How to modify it (for query epansion) Evaluating a search engine
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationModeling User Behavior and Interactions. Lecture 2: Interpreting Behavior Data. Eugene Agichtein
Modeling User Behavior and Interactions ti Lecture 2: Interpreting Behavior Data Eugene Agichtein Emory University Lecture 2 Plan Explicit Feedback in IR Query expansion User control Click From Clicks
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationQuery Suggestion. A variety of automatic or semi-automatic query suggestion techniques have been developed
Query Suggestion Query Suggestion A variety of automatic or semi-automatic query suggestion techniques have been developed Goal is to improve effectiveness by matching related/similar terms Semi-automatic
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationChapter 6 Evaluation Metrics and Evaluation
Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationLecture 7: Relevance Feedback and Query Expansion
Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationSearch Engines Chapter 8 Evaluating Search Engines Felix Naumann
Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition
More informationRefining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback
Refine initially: query Refining searches Commonly, query epansion add synonyms Improve recall Hurt precision? Sometimes done automatically Modify based on pri searches Not automatic All pri searches vs
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: A
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationChapter III.2: Basic ranking & evaluation measures
Chapter III.2: Basic ranking & evaluation measures 1. TF-IDF and vector space model 1.1. Term frequency counting with TF-IDF 1.2. Documents and queries as vectors 2. Evaluating IR results 2.1. Evaluation
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationQuery Refinement and Search Result Presentation
Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the
More informationBoolean Queries. Keywords combined with Boolean operators:
Query Languages 1 Boolean Queries Keywords combined with Boolean operators: OR: (e 1 OR e 2 ) AND: (e 1 AND e 2 ) BUT: (e 1 BUT e 2 ) Satisfy e 1 but not e 2 Negation only allowed using BUT to allow efficient
More informationCourse structure & admin. CS276A Text Information Retrieval, Mining, and Exploitation. Dictionary and postings files: a fast, compact inverted index
CS76A Text Information Retrieval, Mining, and Exploitation Lecture 1 Oct 00 Course structure & admin CS76: two quarters this year: CS76A: IR, web (link alg.), (infovis, XML, PP) Website: http://cs76a.stanford.edu/
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"
CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building
More informationAdvanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University
Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationEvaluation of Retrieval Systems
Evaluation of Retrieval Systems 1 Performance Criteria 1. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationCS347. Lecture 2 April 9, Prabhakar Raghavan
CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information
More informationToday s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan
Today s topics CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card
More informationInformation Retrieval
Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance
More informationManning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques
Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:
More informationThe Security Role for Content Analysis
The Security Role for Content Analysis Jim Nisbet Founder, Tablus, Inc. November 17, 2004 About Us Tablus is a 3 year old company that delivers solutions to provide visibility to sensitive information
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationOn the automatic classification of app reviews
Requirements Eng (2016) 21:311 331 DOI 10.1007/s00766-016-0251-9 RE 2015 On the automatic classification of app reviews Walid Maalej 1 Zijad Kurtanović 1 Hadeer Nabil 2 Christoph Stanik 1 Walid: please
More informationAn Introduction to Maple This lab is adapted from a lab created by Bob Milnikel.
Some quick tips for getting started with Maple: An Introduction to Maple This lab is adapted from a lab created by Bob Milnikel. [Even before we start, take note of the distinction between Tet mode and
More informationSearch Engines and Learning to Rank
Search Engines and Learning to Rank Joseph (Yossi) Keshet Query processor Ranker Cache Forward index Inverted index Link analyzer Indexer Parser Web graph Crawler Representations TF-IDF To get an effective
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap
More informationDD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today
Sec.14.1! Recap: Classification DD2475 Information Retrieval Lecture 10: Clustering Hedvig Kjellström hedvig@kth.se www.csc.kth.se/dd2475 Data points have labels Classification task: Finding good separators
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationAssignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1
COMPUTER ENGINEERING DEPARTMENT BILKENT UNIVERSITY Assignment No. 1 Abdurrahman Yasar June 10, 2014 1 QUESTION 1 Consider the following search results for two queries Q1 and Q2 (the documents are ranked
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationInformation Retrieval
Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid
More informationInformation Retrieval. Lecture 3: Evaluation methodology
Information Retrieval Lecture 3: Evaluation methodology Computer Science Tripos Part II Lent Term 2004 Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationChapter 9. Classification and Clustering
Chapter 9 Classification and Clustering Classification and Clustering Classification and clustering are classical pattern recognition and machine learning problems Classification, also referred to as categorization
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More information