Chap. 3. Chap. 3. Recall and Precision Alternative Measures. TREC Collection CACM and ISI Collections CFC (Cystic Fibrosis Collection)

Similar documents
Modern Information Retrieval

Retrieval Evaluation

Evaluation of Retrieval Systems

Pointers. CS2023 Winter 2004

Cartons (PCCs) Management

Information Retrieval. Lecture 3: Evaluation methodology

Lecture 5 C Programming Language

Pointers & Arrays. CS2023 Winter 2004

Information Retrieval and Web Search

Personal Conference Manager (PCM)

CMPT 470 Based on lecture notes by Woshun Luk

APPLESHARE PC UPDATE INTERNATIONAL SUPPORT IN APPLESHARE PC

) $ G}] }O H~U. G yhpgxl. Cong

Building Test Collections. Donna Harman National Institute of Standards and Technology

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

Cassandra: Distributed Access Control Policies with Tunable Expressiveness

State of Connecticut Workers Compensation Commission

ERNST. Environment for Redaction of News Sub-Titles

To provide state and district level PARCC assessment data for the administration of Grades 3-8 Math and English Language Arts.

III. CLAIMS ADMINISTRATION

A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper December 2011

Modules. CS2023 Winter 2004

Second Year March 2017

Version /10/2015. Type specimen. Bw STRETCH

Sheila. Regular Bold. User s Guide

ClaimSpotter: an Environment to Support Sensemaking with Knowledge Triples

HoneyBee User s Guide

O Type of array element

Estimate Traffic with Combined Neural Network Approach

Adorn. Serif. Smooth. v22622x. user s guide PART OF THE ADORN POMANDER SMOOTH COLLECTION

ASCII Code - The extended ASCII table

ConMan. A Web based Conference Manager for Asterisk. How I Managed to get Con'd into skipping my summer vacation by building this thing

1 Swing 2006A 5 B? 18. Swing Sun Microsystems AWT. 3.1 JFrame JFrame GHI

Contrast. user s guide

Adorn. Slab Serif Smooth R E G U LAR. v22622x

Bold U S E R S G U I D E

Adorn. Slab Serif BOLD. v x. user s gu ide

Adorn. Serif. Smooth. v22622x

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

ADORN. Roman. v x. user s gu ide

CS47300: Web Information Search and Management

1. Oracle Mobile Agents? 2. client-agent-server client-server

Editors: prof. Ing. Iveta Ubrežiová, CSc., Ing. Drahoslav Lančarič, PhD., Ing. Ingrida Košičiarová, PhD. ISBN

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

³ ÁÒØÖÓÙØÓÒ ½º ÐÙ ØÖ ÜÔÒ ÓÒ Ò ÌÒ ÓÖ ÓÖ ¾º ÌÛÓ¹ÓÝ ÈÖÓÔÖØ Ó ÓÑÔÐÜ ÆÙÐ º ËÙÑÑÖÝ Ò ÓÒÐÙ ÓÒ º ² ± ÇÆÌÆÌË Åº ÐÚÓÐ ¾ Ëʼ

Getting round your Mac with Shortcut Keys

Cochrane Database of Systematic Reviews (Cochrane Reviews)

ESCAPE SEQUENCE G0: ESC 02/08 04/13 C0: C1: NAME Extended African Latin alphabet coded character set for bibliographic information interchange

BUCKLEY. User s Guide

Appendix C. Numeric and Character Entity Reference

USB-ASC232. ASCII RS-232 Controlled USB Keyboard and Mouse Cable. User Manual

Adorn. Serif. v x. user s gu ide

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

How to Implement DOTGO Engines. CMRL Version 1.0

options (alternatives)

OOstaExcel.ir. J. Abbasi Syooki. HTML Number. Device Control 1 (oft. XON) Device Control 3 (oft. Negative Acknowledgement

Communication and processing of text in the Kildin Sámi, Komi, and Nenets, and Russian languages.

A COMPARISON BETWEEN OBJECT-ORIENTED DATABASE SYSTEMS AND INFORMATION SYSTEM SHELLS

124 DISTO pro 4 / pro 4 a-1.0.0zh

Glossary. ASCII: Standard binary codes to represent occidental characters in one byte.

Myriad Pro Light. Lining proportional. Latin capitals. Alphabetic. Oldstyle tabular. Oldstyle proportional. Superscript ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹,.

Configuring the SFS Environment Manually [4]

Tools for LC physics study

Chemistry Hour Exam 2

Using USB Hot-Plug For UMTS Short Message Service. Technical Brief from Missing Link Electronics:

Communication and processing of text in the Chuvash, Erzya Mordvin, Komi, Hill Mari, Meadow Mari, Moksha Mordvin, Russian, and Udmurt languages.

arxiv: v1 [physics.soc-ph] 17 May 2007 Hiroyasu Inoue a,, Wataru Souma b, Schumpeter Tamada c

7DONSODQ. ƒ We called our platform 9 D-II TG: Distributed Internet Traffic Generator

½ œ»ž Ž. «À- œïÿ µ à ] ½ ú ½ ƒä ½ œïÿ µ úˆå µ œï œ ˆÅ ½ ˆ½Å œ Ÿ ÄˆÅ Ä ½ ˆÅú ½ ˆÅ ž -

IPv6 Servic es. LONG Net w ork

Banner 8 Using International Characters

Information Retrieval

Chapter 8. Evaluating Search Engine

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Curvature of subdivision surfaces

COFI Approach for Mining Frequent Itemsets Revisited

Curved PN Triangles. Alex Vlachos Jörg Peters

MATH 1242 FALL 2008 COMMON FINAL EXAMINATION PART I. Instructor:

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS1614

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS1614. DATE: 5 March 2015 MARKS: 100 SECTION A (36)

A Survey of Current CLOS MOP Implementations

MAT 22B-001: Differential Equations

NGN Charging issues NGN charging overview Some of IP charging projects and international activities Proposals for NGN charging

Solutions B B B B B. ( B/.B œ œ B/ ( /.B œ B/ / G

Ancillary Software Development at GSI. Michael Reese. Outline: Motivation Old Software New Software

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

Adaptive Image Transmission

KbdKaz 500 layout tables

The course that gives CMU its Zip! Web Services Nov 26, Topics HTTP Serving static content Serving dynamic content

OGSA/OGSI, WSRF & Globus

Introduction to operation

Calligraphic Packing. Craig S. Kaplan. Computer Graphics Lab David R. Cheriton School of Computer Science University of Waterloo. GI'07 May 28, 2007

Sklonar. Number of fonts in a family: 3 (Light, Medium, Bold) Number of glyphs per font: 411 Release date: 2016

KBD-SFTCFG. Instruction Manual Software for the IntuiKey Series Digital Keyboard Software Version 1.80

Introduction to Information Retrieval

For information on how to access the swashes and alternates, visit LauraWorthingtonType.com/faqs

The Logical Solution - Digital Video Extension. DVI EXTENSION - Velocity Series

Automatic Verification of Finite State Concurrent Systems

Using non-latin alphabets in Blaise

Transcription:

b!"$#%&'(!) *,+.-0/1-0/2 3547698;:'<>=?@A8;BC<D6 E5FHGIKJMLNGPORQTSUVGTJXWKYJPZ Q\[]'G^G'ORQTS`_QTIaLNY[ Recall and Precision Alternative Measures cedfgdthpdjik'd^l"mrǹ ncd\kxo;pcmdiq TREC Collection CACM and ISI Collections CFC (Cystic Fibrosis Collection) 2

rsutwvhxzy {~}t Tx.s 5ƒ T A Kŵ ' uš ŒP Ž wšj ` ŠT a NˆD time and space 5 H K MšN P RœT ž T XŸK P 0œ ^ 9 RœT œ\ ;šcd for data retrieval response time and the space required for information retrieval: evaluation of how precise is the answer set based on a test reference collection and an evaluation measure reference collection: documents, example information requests, relevant documents evaluation measure: quantifies the similarity between the set of documents retrieved by strategy S and those provided by the specialist. 3 ª «H M ²± ³ «µj µj ¹5º» ¼P» ½u¾À g¾j¼'á I : an information request R : a set of relevant documents A : a document answer set in answer set R a collection R answer set A 4

5ØHÙTÚ'ÛTÜ Ü ÂªÃ ÄHÅ ÆPÆÇÅ È~É Ê ËHà ÄÌTÍ&ÌTÎ È ÏÑÐwÒÓHÔ'Õ Ö the fraction of the relevant documents (the set R) which has been retrieved. Ý5ÞVßPà áaânãxâcäwå R Recall = a R the fraction of the retrieved documents (the set A) which is relevant. R Precision = a A 5 æªç èhé êpêçé ë~ì í îhç èïtð&ïtñ ë òñówôõhö' ø ù5ú9ûdü ýÿþ R q ={d 3, d 5, d 9, d 25, d 39, d 44, d 56, d 71, d 89, d 123 } R q : a set containing the relevant documents for q. Ranking for query q : d 123 d 84 d 56 d 6 d 8 d 9 d 511 d 129 d 187 d 25 d 38 d 48 d 250 d 113 d 3 Precision at 11 standard recall levels 6789 :; :<= #! " $&%$('$()$+*$(,$ -$+.$(/$(0$ %$$ 1 23455 6

š >@?BADCBEEFCHGJI KMLD?BAONQPRNQSTG UWVYX[ZD\^]_ `badcfeqghqifekjlg^eqmn own prqscfeqgoutvowgexm^hzyy {}ni t~geqo ƒ ˆ Š }ƒ~ ˆƒŠ } Œ ŒŽ ƒ ˆ } ˆƒ} W r ˆƒŠ Ÿš š ž š š ±²³ µ œ š š ª «Ž evaluate quantitatively both the quality of the overall answer set and the breadth of the retrieval algorithm advantages simple, intuitive, can be combined in a single curve. 7 º¹B»D¼¾½½ ¼HÀJÁ ÂÄÃD¹B»dÅzÆÇÅzÈTÀ ÉWÊYË[ÌkÍÎÏ ÐÒÑrÓ ÔÖÕx ØdÙÛÚÜ ÝÞØßÑàÝ ámáâú ãäó Øxå compare retrieval performance for individual queries æòçrè éöêxë ìdíûîüëïþìûðäìüîòñuï ó^ìqñ Average precision at seen relevant documents averaging the precision figures obtained after each new relevant document is observed (in the ranking) favors systems which retrieve relevant documents quickly R-precision Precision histograms Summary table statistics 8

ôºõböd ¾øø HùJú ûäüdõbödýzþçýzÿtù!" "$#&%('*),+-/.103243+657598(:<; proper estimation of maximum R for a query requires detailed knowledge of all the documents in the collection. R and P are related measures which capture different aspects of the set of retrieved documents R and P measure the effectiveness over a set of queries processed in batch mode R and P are easy to define when a linear ordering of the retrieved documents is enforced 9 =?>A@ BDCFEHGI@KJ LMBON BPGRQTSC BPQ UWVYX/Z X\[K]3^`_ abcfdwz6[a ewfyg/hdikjwh6lkm noh prqdsatuwvxyu{ztw }~t6 t6ƒs uts Coverage Novelty Relative recall Recall effort ˆ Š Œ*ŽW 6 YŒw Expected search length Satisfaction Frustration 10

? A D F rĩ šm œ P R Ÿž P / 3 ªW«Y / \ K 3 `± ²³ FµW 6 ² a high value only when both recall and precision are high 2 F ( j) = r(j) : the recall for the j-th ranked document 1 1 + P(j) : the precision for the j-th ranked document r ( j) P ( j) F(j) : the harmonic mean of r(j) and P(j) W Y /¹Dºk»W¹6¼K½ ¾ ¹ to allow the user to specify whether he is more interested in recall or in precision E(j) : E evaluation measure 2 1+ b b : a user specified parameter which reflects the E( j) = 1 2 b 1 relative importance of recall and precision + r( j) P( j) (b>1: more precision, b<1: more recall) 11 À ÁFÂDÃyÄÆÅÇÃ*È ÂÊÉWË ÂPÌ Í ÂPÎ ÁŸÏÐÈÂIÁ R answer set A known to the user U known to the user which were retrieved R k previously unknown to the user which were retrieved R u [Coverage and novelty ratios for a given example information request] 12

Ñ Ò*Ó<Ô Õ ÖOÔF ÆÓÙØ Ú ÓPÛ Ü ÓPÝ ÒTÞÐÔ ÓPÒ ß àâá ã ä3åæ coverage = R U k novelty = R u Ru + R k high coverage ratio The system is finding most of the relevant documents the user expected to see. high novelty ratio The system is revealing (to the user) many new relevant documents which were previously unknown. 13 çœèyé$èùê èêëhì èîíhï1ðaðè<ì*ñâòæïmëó ôõ,ö{ (øù Šú{ û{ü ýþâý6 ÿ/û øö û ý öú (ÿ lacks a solid formal framework as a basic foundation lacks robust and consistent testbeds and benchmarks experiments was based on relatively small test collections comparisons between various retrieval systems were difficult "!#%$ TREC (Text Retrieval Conference) collection large size, thorough experimentation CACM and ISI collections historical importance in IR Cystic Fibrosis collection 14

&(')*&+-,/..103242)658790;: <>=@?%ACBEDGF4HJI(AKJL@MD#NONPAQF LSRTDK sources (WSJ, AP, ZIFF, FR, DOE, etc.) documents are tagged with SGML for easy parsing preserve as much of the original structure as possible provide a common framework for simple decoding <doc> <docno> WSJ880406-0090 </docno> <h1> AT &T Unveils Services to Upgrade Phone Networks Under Global Plan </h1> <author> Janet Guyon(WSJ Staff) </author> <dateline> New York </dateline> <text> American Telephone & Telegraph Co. Introduced the first of a new generation of phone services with broad.. </text> </doc> 15 UWV1XYU1Z-[1\ \]3^4^X6_8`a];b cedgfhijk l>m@n%ocp@qsrutwvyxocz {J}~G t rƒ ~g{ ˆoƒ GŠJoƒ e " ŒTm~v@ Ž description of an information need in natural language for testing a new ranking algorithm <top> <num> Number: 168 <title> Topic: Financing AMTRAK <desc> Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK) <narr> Narrative: A relevant document must provide information on the government s responsibility to make AMTRAK an economically viable entity. It could also discuss the privatization of AMTRAK as an alternative to continuing government subsidies. Documents comparing government subsidies given to air and bus transportation with those provided to AMTRAK would also be relevant. </top> 16

W 1 Y 1-1 3 4 6 8 š ; œe gžÿ > @ % ª#«Ē ²± ³µ 8 g @¹%«±e w¹yºsy ¼ J " s½ («"¾ g ƒ G³J ƒ À the set of relevant documents for each topic obtained from a pool of possible relevant documents the pool is created by taking the top K documents in the rankings generated by the various retrieval systems pooling method a technique for assessing relevance the vast majority of the relevant documents is collected in the assembled pool the documents which are not in the pool can be considered to be not relevant 17 ÁWÂ1ÃYÁ1Ä-Å1Æ ÆÇ3È4ÈÃ6É8ÊËÇ;Ì ÍeÎgÏÐÑÒÓ Ô>Õ@Ö% CØÙ Ú%ÛÜÖÝ(ÞƒßÜàáÕÞuâãàyâ ad hoc task a set of new (conventional) requests are run against a fixed document database for training and allowing the tuning of the retrieval algorithm routing task a set of fixed requests are run against a database whose documents are continually changing filtering for testing the tuned retrieval algorithm 18

ä(åæèçwé-çëê ìîíï ðòñwðwçó3ô4ôæöõ8 gøóùíú ûüþýüÿ ü 3204 Communications of the ACM articles subfields author, date, title, abstract, categories, references, bibliographic coupling, number of co-citations! 1460 documents selected from ISI similarities based on terms and on cross-citation patterns subfields author, word stems from title and abstraction sections, number of co-citations for each pair of articles 19 "$#&%('*)+'-,.0/&1 23*24'&576869%;:=<>95?/&@ A&BDCFEC G HIC G JKHMLON P=CRQ SUTWVXTZY Ë [\?]B]T^Ǹ _a_ SJC G N [H bdcoefefgahaikjfcol mdnpozq rschutvmdnpoq wxgzyao{t wxgzy o{t} ~rsc ht F Iƒ F d ˆ Š F pœ p Ž ˆ O ˆ ŒF z F p ˆ x Document statistics for the CACM and ISI collections µ O f œa z O F Iƒ F ³ x² œ Fœz œa O Œ œ ²{ Iœz Fœz u± ˆšû d ˆ œ œzžÿx Iœz Fœz u± ˆ p œ œzžÿx ª a«ˆš ˆ Query statistics for the CACM and ISI collections 20

4 º¹¼»¾½À ÂÁà ÄÆÅÇÃÉȾÊÌËÍ MÃ9 λºË7ÏKÏ ¹ÐÄ=Á`ÃË?Ñ Ò&ÓÕÔÖØ RÙ~ÚMÛÙuÜÞÝKß{ÖØÙ ÖàÚKß á á â Ú Ùßã 1239 documents fields contained in each document MEDLINE accession number Author Title Source Major subjects Minor subjects Abstract (or extract) References Citations 21 ä4åºæ¼ç¾èàéâêë ìæíçëéî¾ïìðíémë9éîçºð7ñkñ æðì=ề ëð?ò óõôõö? ùø ú*ûª üéýfþÿý þ ü þ ý 9ü8ø Iý ø set of relevance scores was generated directly by human experts through a careful evaluation strategy includes a good number of information requests as a result, the respective query vectors present overlap among themselves. allows experiment with retrieval strategies which take advantage of past query sessions to improve retrieval performance 22