Retrieval Effectiveness Measures. Overview

Similar documents
UFuRT: A Work-Centered Framework and Process for Design and Evaluation of Information Systems

UML : MODELS, VIEWS, AND DIAGRAMS

FIREWALL RULE SET OPTIMIZATION

Software Engineering

Systems & Operating Systems

It has hardware. It has application software.

McGill University School of Computer Science COMP-206. Software Systems. Due: September 29, 2008 on WEB CT at 23:55.

TL 9000 Quality Management System. Measurements Handbook. SFQ Examples

Overview of Data Furnisher Batch Processing

WEB LAB - Subset Extraction

INVENTION DISCLOSURE

Overview. Enhancement for Policy Configuration Module

An Introduction to Crescendo s Maestro Application Delivery Platform

Design Patterns. Collectional Patterns. Session objectives 11/06/2012. Introduction. Composite pattern. Iterator pattern

Infrastructure Series

Contents: Module. Objectives. Lesson 1: Lesson 2: appropriately. As benefit of good. with almost any planning. it places on the.

Please contact technical support if you have questions about the directory that your organization uses for user management.

Transmission Control Protocol Introduction

CS4500/5500 Operating Systems Computer and Operating Systems Overview

Customized RTU for Local and Remote Supervision

On the road again. The network layer. Data and control planes. Router forwarding tables. The network layer data plane. CS242 Computer Networks

Due Date: Lab report is due on Mar 6 (PRA 01) or Mar 7 (PRA 02)

Chapter 10: Information System Controls for System Reliability Part 3: Processing Integrity and Availability

Software Toolbox Extender.NET Component. Development Best Practices

E. G. S. Pillay Engineering College, Nagapattinam Computer Science and Engineering

CISC-103: Web Applications using Computer Science

Objectives. Topic 8: Input, Interaction, & Introduction to callbacks. Input Devices. Project Sketchpad. Introduce the basic input devices

Users, groups, collections and submissions in DSpace. Contents

Ascii Art Capstone project in C

Summary. Server environment: Subversion 1.4.6

Using the DOCUMENT Procedure to Expand the Output Flexibility of the Output Delivery System with Very Little Programming Effort

Data Communications over Context-Based WMNs Delay Performance Evaluation

Presentation of Results Experiment:

Custodial Integrator. Release Notes. Version 3.11 (TLM)

Operating systems. Module 15 kernel I/O subsystem. Tami Sorgente 1

Lecture Handout. Database Management System. Overview of Lecture. Vertical Partitioning. Lecture No. 24

$ARCSIGHT_HOME/current/user/agent/map. The files are named in sequential order such as:

- Replacement of a single statement with a sequence of statements(promotes regularity)

Data Structure Interview Questions

Aloha Offshore SDLC Process

Grade 4 Mathematics Item Specification C1 TJ

Date: October User guide. Integration through ONVIF driver. Partner Self-test. Prepared By: Devices & Integrations Team, Milestone Systems

Common Language Runtime

CCNA 1 Chapter v5.1 Answers 100%

Final Report. Graphical User Interface for the European Transport Model TREMOVE. June 15 th 2010

Reading and writing data in files

Customer Information. Agilent 2100 Bioanalyzer System Startup Service G2949CA - Checklist


Using SPLAY Tree s for state-full packet classification

HP Server Virtualization Solution Planning & Design

QUICK START GUIDE FOR THE TREB CONNECT INTERFACE

Tekmos. TK68020 Microprocessor. Features. General Description. 9/03/14 1

Independent Arbitration for Customers. Application Form

MID-II Examinations April 2018 Course: B. Tech Branch:CSE Year: II. Date of Exam: AN Max.Marks 30 TIME :02:00PM TO 03:00 PM

TRANSPIRE Data Management plan Version 1.0 April

But for better understanding the threads, we are explaining it in the 5 states.

Performance of VSA in VMware vsphere 5

Lecture Handout. Database Management System. Overview of Lecture. Inheritance Is. Lecture No. 11. Reading Material

Chapter 14. Basic Planning Methodology

Low-Fidelity Prototyping. Overview. Short Review of User-Centered Design. SMD157 Human-Computer Interaction Fall 2003

Computer Science Department cs.salemstate.edu. ITE330 Web Systems. Catalog description:

ClassFlow Administrator User Guide

CAMPBELL COUNTY GILLETTE, WYOMING

Course 10262A: Developing Windows Applications with Microsoft Visual Studio 2010 OVERVIEW

ET395T Modern Wireless Communications [Onsite]

Scatter Search And Bionomic Algorithms For The Aircraft Landing Problem

B Tech Project First Stage Report on

Assignment 10: Transaction Simulation & Crash Recovery

1. The first section examines common performance bottlenecks that need to be considered.

Frequently Asked Questions Read and follow all instructions for success!

Quick Guide on implementing SQL Manage for SAP Business One

CONTROL-COMMAND. Software Technical Specifications for ThomX Suppliers 1.INTRODUCTION TECHNICAL REQUIREMENTS... 2

ECE 545 Project Deliverables

A FRAMEWORK FOR PROCESSING K-BEST SITE QUERY

WinEst 15.2 Installation Guide


TRAINING GUIDE. Overview of Lucity Spatial


Tips For Customising Configuration Wizards

App Center User Experience Guidelines for Apps for Me

Level 2 Development Training

Task 1 High-Level Object-Oriented Class Specification Create Initial Design Classes Designing Boundary Classes

Procurement Contract Portal. User Guide

Teaching Performance Evaluation Using Supervised Machine Learning Techniques

Because this underlying hardware is dedicated to processing graphics commands, OpenGL drawing is typically very fast.

Performance and Scalability Benchmark: Siebel CRM Release 7 on IBM eserver pseries and IBM DB2 UDB. An Oracle White Paper Updated September 2006

CS200T Programming in Java I [Onsite]

Extensible Query Processing in Starburst

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

LAB 7 (June 29/July 4) Structures, Stream I/O, Self-referential structures (Linked list) in C


SSDNow vs. HDD and Use Cases/Scenarios. U.S.T.S. Tech. Comm

Linking network nodes

HP ExpertOne. HP2-T21: Administering HP Server Solutions. Table of Contents

ALU Design. ENG2410 Digital Design Datapath Design. Parts of CPU. Memory and I/O. Resources. Week #9 Topics. School of Engineering 1

softpanel generic installation and operation instructions for nanobox products

Higher Maths EF1.2 and RC1.2 Trigonometry - Revision

Report Writing Guidelines Writing Support Services

REQUIREMENT ENGINEERING: AN OVERVIEW

Relius Documents ASP Checklist Entry

Transcription:

Retrieval Effectiveness Measures Vasu Sathu 25th March 2001 Overview Evaluatin in IR Types f Evaluatin Retrieval Perfrmance Evaluatin Measures f Retrieval Effectiveness Single Valued Measures Alternative Measures TREC Cllectin 2

Why Evaluate an IR system Evaluatin f Infrmatin Retrieval system is dne befre its final implementatin. T find ut whether the users really need such a system and will it be really wrth it. T select between alternate systems. T determine if a system meets expressed and unexpressed needs f current users and nn-users. T imprve IR systems and determine if imprvement actually ccured. 3 What t evaluate Cverage f the cllectin that is the extent t which the system includes relevant matter. Time lag, which is the interval between the time the search request is made and the time an answer is given. Frm f presentatin f the utput. Effrt invlved by the user in btaining answers t his search request. Recall f the system, which is the prprtin f relevant material actually retrieved in answer t a search request. Precisin, which is the prprtin f retrieved material that is actually relevant. 4

Types f Evaluatin IR system ften cmpnent f larger system. Might evaluate several aspects like Speed f retrieval. Resurces required. Presentatin f dcuments. Ability t find relevant dcuments. Evaluatin is generally cmparative. Mst cmmn evaluatin - retrieval effectiveness. 5 Retrieval Perfrmance Evaluatin The first step in the Evaluatin prcess is Functinal Analysis in which the specified system functinalities are tested ne by ne. It shuld als include an errr analysis phase during which it is useful t catch prgramming errrs. After the system has passed the functinal analysis phase it shuld prceed t evaluate the perfrmance f the system. In a system designed fr prviding data retrieval, the respnse time and space required are usually the metrics f mst interest fr evaluating the system. 6

Retrieval Perfrmance Evaluatin Perfrmance Evaluatin while using indexing structures depends n several factrs like interactin with the perating system, delays in cmmunicatin channels and verheads intrduced by many sftware layers. Relevance ranking plays a central rle in IR. IR systems require the evaluatin f hw precise is the answer set and this type f evaluatin is knwn as Retrieval Perfrmance Evaluatin. Retrieval Perfrmance Evaluatin is als dependent n tw majr factrs Test Reference Cllectins. Evaluatin measures. 7 Retrieval Perfrmance Evaluatin Test reference cllectin cnsists f - cllectin f dcuments - set f example infrmatin requests. - set f relevant dcuments fr each example infrmatin request Based n the ability f the retrieval system t distinguish between wanted and unwanted items. Retrieval task culd be in either f the tw mdes. - Batch : The user submits a query and receives an answer back - Interactive sessin : The user specifies his infrmatin need thrugh a series f interactive steps with the system. 8

Measures f Retrieval Effectiveness Effectiveness is purely a measure f the ability f the system t satisfy the user in terms f relevance f dcuments retrieved. A relevant dcument is ne which is quite related t the cntext f a query the user is interested. It is Recall and Precisin which attempt t measure what is knwn as the effectiveness f the retrieval system. It is the measure f the ability f the system t retrieve relevant dcuments while at the same hlding the nn- relevant ne. 9 Precisin and Recall Precisin - prprtin f a retrieved set that is relevant. Precisin = relevant retrieved / retrieved =P(relevant/retrieved) Recall - prprtin f all relevant dcuments in the cllectin included in the retrieved set. Recall = relevant retrieved / relevant =P(retrieved/relevant) Recall and precisin are inversely prprtinal t each ther 10

Precisin and Recall(Cntd..) Cntingency Table Relevant Nn Relevant Retrieved w x Relevant = w + y Retrieved = w + x Nt Retrieved y z Precisin = w/(w+x ) Recall = w/(w + y ) Fallut = x/(x + z ) Ttal N = w + x + y + z 11 Recall Graph Recall when mre and mre dcuments are retrieved. The graph represents a terraced shape 12

Precisin Graph Precisin when mre and mre dcuments are retrieved. The graph represents a sawtth shape 13 Precisin and Recall(Cntd..) Fallut :- Prprtin f retrieved nn-relevant dcuments F= y / ( N - Relevant) It defines hw well the system filters ut nn - relevant dcuments. Generality :- Prprtin f relevant dcuments in the cllectin G= Relevant / N. Criteria cmmnly used t evaluate perfrmance Recall Precisin User Effrt ie Amunt f time the user spends n cnducting the search and amunt f time user spends negtiating his enquiry and then separating relevant frm irrelevant items. 14

Alternative Measures A single measure which cmbines precisin and recall is given by harmnic mean F as F(j) = 2 / (1/r(j) + 1/P(j)) where - r(j) is the recall fr the j-th dcument in the ranking. - P(j) is the precisin fr the j-th dcument in the ranking. - F(j) s the harmnic mean f r(j) and P(j). The value f F lies between 0 and 1. 0 - when n relevant dcuments have been retrieved. 1- when all ranked dcuments are relevant. F assumes a high value when bth recall and precisin are high. 15 Alternative Measures(Cntd..) E measure (van Rijsbergen) Used t specify precisin( r recall).the E measure is defined as fllws E(j) = 1-1+ b 2 / (b 2 /r(j) + 1/P(j)) where - r(j) is the recall f the j-th dcument in the ranking. - P(j) is the precisin fr the j-th dcument in the ranking. - E(j) is the evaluatin measure relative t r(j) and P(j). - b is the user specified parameter which reflects the relative imprtance f recall and precisin. Fr b=1 E(j) wrks as cmplement f Harmnic Mean b > 1 - user is interested in precisin rather then recall b < 1 user is interested in recall than in precisin. 16 α

Single Valued Measures Nrmalized precisin r recall measures area between actual and ideal curves. Pint at which precisin = recall is called Breakeven pint. Swets Mdel. Expected search length. Utility measures - assign measures t each cell in the cntingency table. - sum (r average) csts fr all the queries. 17 Swets Mdel It was prpsed by Swets in 1963. Based n the signal detectin and statistical decisin thery. Prperties f a desirable measure f retrieval perfrmance. - Based n the ability f the retrieval system t distinguish between wanted and unwanted items. - It shuld express discriminatin pwer independent f any acceptance criterin emplyed by system r user. - Measure shuld be a single number. - It shuld allw cmplete rdering f different perfrmances, indicate the amunt difference and assess perfrmance in abslute terms. 18

2 σ 1 + σ 2 2 Swets Mdel(Cntd..) Characterizes recall fallut curves generated by the variatin f a cntrl variable. It uses distance between peratin characteristics as its measure. Brks equatin 2 2 S2 = u2 + u1 / ( ) ó + 1 ó 2 The area under the recall fallut graph is strictly increasing functin f S2 19 Expected Search Length Users are able t quantify their infrmatin need accrding t ne f the fllwing types - nly ne relevant dcument is needed. - sme arbitrary number n is wanted. - all relevant dcuments are wanted. - a given prprtin f the relevant dcuments. Output f a search strategy is assumed t be weak rdering f dcuments. Simple rdering means n tw r mre dcuments at the same level rdering. Search length is the number f nn-relevant dcuments a user must scan befre the infrmatin need is satisfied. 20

Expected Search Length(Cntd..) ESL = j + i.s/(r + 1) where - q is the query f given type - j is the ttal number f dcuments relevant t q in all levels preceding the final. - r is the number f relevant dcuments in the final level - i is the number f nn-relevant dcuments in the final level. - s is the number f relevant dcuments required frm the final level t satisfy need. Use mean expected search length fr a set f queries. If queries r cllectins vary, cmpare ESL t the expected randm search length. 21 TREC Cllectin Text Retrieval Cnference(TREC) is dedicated t experiment with a large test cllectin cmprising ver a millin dcuments. TREC series is rganized by the Natinal Institute f Standards and Technlgy(NIST). Its gal is t encurage research in IR frm large text applicatins by prviding a large text cllectin TREC cllectin is cmpsed f three parts : Dcuments. Example infrmatin requests(called tpics). set f relevant dcuments fr each example infrmatin request. 22

TREC Cllectin(Cntd..) TREC cllectin has dcuments which cme frm surces like Wall Street Jurnal(WSJ),Assciated Press(AP),Federal Register(FR),US Patents(PAT) etc. The task f cnverting an infrmatin request(tpic) in t a system query must be dne by the system itself and is cnsidered t be an integral part f the evaluatin prcedure. The set f relevant dcuments fr each example infrmatin request is btained frm a pl f pssible relevant dcuments. Pling methd is used evaluate the relevance f each dcument. 23 Cnclusin Retrieval can be made mre effective by using mre techniques and making the search mre effective. Users shuld be able t search in ways that are already familiar r that they have fund t be effective. A visual representatin f the cntents f a system may aid users in rienting themselves. 24

References Mdern IR] Ricard Baeza-Yates, Berthier Ribeir-Net. "Mdern Infrmatin Retrieval." Addisn-Wesley (ACM Press), January 1999. [IR] C.J. van Rijsbergen. "Infrmatin Retrieval, Secnd Editin.",1999, 192 pages. Inf Strage & Retrieval] Rbert R. Krfhage. "Infrmatin Strage and Retrieval." Wiley, May 1997. \ Cmmunicatins f the ACM Jurnal- An evaluatin f retrieval effectiveness fr a full-text dcument retrieval system -David C. Blair and M.E. Marn Vl 28 Number3 25