DELOS WP7: Evaluation Claus-Peter Klas Univ. of Duisburg-Essen, Germany (WP leader: Norbert Fuhr)
WP Objectives Enable communication between evaluation experts and DL researchers/developers Continue existing evaluation initiatives relevant for the DL area Develop new evaluation models, methods and testbeds 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 2
Tasks in WP7 T7.1: Evaluation Forum T7.2: Evaluation Models and Methods T7.3: INEX T7.4: CLEF T7.5: Evaluation Testbeds 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 3
Task 7.1: Evaluation Forum Creation of a forum enabling communication between DL evaluation experts and DL developers, comprising the following components: a discussion forum on evaluation issues a collection of existing evaluation approaches a collection of existing testbeds and toolkits for DL evaluation 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 4
DELOS Evaluation Forum Available since May 2004. Allowing discussion between the members of Workpackage 7 on multiple threads. Supported by a website, presenting the activities of Workpackage 7. Contains a growing collection of resources (literature, testbeds and toolkits). Frequently updated with news and announcements. 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 5
Evaluation Forum Website address http://dlib.ionio.gr/wp7 Forum address http://dlib.ionio.gr/ delosforum 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 6
Task 7.2: Evaluation Models and Methods Integrated research on DL evaluation Initial focus on specification of standard DL evaluation methods Starting with comparison and evaluation of existing evaluation methodologies DL evaluation workshop 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 7
DL evaluation workshop Padova, Italy, October 4-5. (Participation is by invitation only) Invited Talks: Overview over Digital Library Evaluation (Tefko Saracevic Facets of DL evaluation: Users (Ann Blandford) Usage (Christine Borgman) Systems (Ed Fox) Collections (Derek Law) Evaluation requirements from other DELOS clusters DL evaluation plans of current EU projects 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 8
Task 7.3: INEX Initiative for the Evaluation of XML Retrieval T7.3 responsible for organization of INEX 2004 third in a series of XML IR system evaluation campaigns Background: Increased use of XML as document format on the Web and in digital libraries Development of retrieval systems to store and access XML documents Objectives: - Creation of evaluation infrastructure and organisation of regular evaluation campaigns for system testing - Building of an XML-IR research community - Construction of test beds + appropriate scoring methods for evaluating content-oriented XML retrieval 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 9
INEX test collection Core collection: Documents: 12,107 articles of IEEE Computer Society s publications from 12 magazines and transactions Queries: Content-only (CO) Content-and-structure (CAS) Relevance assessments Metrics 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 10
INEX - 4 main tasks Evaluation of retrieval effectiveness, especially by refining the evaluation criteria, in order to consider how XML elements satisfy information needs in the context of digital libraries. Evaluation of efficiency, taking into account the larger number of possible answers (XML elements) and their possible overlap. Prototype evaluation of usability, considering various types of information-seeking activities in an interactive setting. Investigation of new testbeds for heterogeneous and multimedia documents in the context of XML. 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 11
Effectiveness Evaluation of retrieval effectiveness: Develop/refine evaluation criteria: how do XML elements satisfy information needs in the context of digital libraries. Ad hoc retrieval + 4 tracks Relevance feedback track Heterogeneous data track Natural language track Interactive track Development of evaluation methodologies including metrics 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 12
Usability Prototype evaluation of usability, considering various types of information-seeking activities in an interactive setting. Done as part of the interactive track Investigate user behaviour when interacting with XML documents Develop and investigate retrieval approaches that are effective in interactive settings 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 13
INEX 2004 57 participants: 54% Europe 21% USA 12% Asia + Australia/Oceania Strong involvement of the participants in the various tasks and evaluation methodologies Currently we are doing the relevance assessments in the ad hoc task INEX 2004 workshop: December 6-8 in Dagstuhl/Germany 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 14
Task 7.4 Cross-Language Evaluation Forum T 7.4 responsible for organisation of CLEF 2004 fifth in a series of CLIR system evaluation campaigns Objectives of CLEF Promote research and stimulate development of multilingual IR systems for European languages, through Creation of evaluation infrastructure and organisation of regular evaluation campaigns for system testing Building of an MLIA/CLIR research community Construction of publicly available test-suites CLEF 2004 has seen shift in focus from cross-language doc. retrieval to include information extraction in multilingual multimedia context 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 15
CLEF 2004: Evaluation Tracks CLEF 2004 offered six tracks designed to evaluate the performance of systems for: mono-, bi- and multilingual document retrieval on news collections (Ad-hoc) mono- and cross-language domain-specific retrieval (GIRT) interactive cross-language retrieval (iclef) multiple language question answering (QA@CLEF) cross-language retrieval on image collections (ImageCLEF) cross-language spoken document retrieval (CL- SDR) 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 16
CLEF 2004 Topics Queries for ad hoc tasks could be formulated from 50 topics in 14 languages (including Bulgarian, Chinese, Amharic) 200 questions for QA tasks prepared in seven languages 50 short topics for cross-language spoken doc. track prepared in 6 languages Topics in twelve languages for 3 different image retrieval tasks involving text and content-based retrieval techniques iclef task used topics in English, Spanish and French 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 17
CLEF 2004: Results Participation is up: 55 groups in 2004 (42 in 2003) Expansion of test-suite Great success of QA@CLEF and ImageCLEF Synergy of diverse expertise partly consequence of new tracks IR, NLP, Image Processing, Medical Informatics.. CLEF 2004 Workshop 15-17 September, in conjunction with ECDL2004, ca 100 participants (70 in 2003) Deliverable 7.4.1 Working Notes ca 1000 pages description and results of all experiments distributed at Workshop Detailed analysis of participants results and research trends will be made after the workshop 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 18
Intentions for CLEF 2005 The agenda for CLEF 2005 will be decided during the CLEF2004 workshop and the Steering Committee meeting First proposals include: Focus on central/east european languages for the ad hoc tasks New collections and expansion of medical task for ImageCLEF Feasibility study and pilot implementation of a cross-language web track Use of a new collection for CL-SDR: the MALACH collection transcripts of Holocaust survivors made available by the SHOA foundation Pilot study for a cross-language geographic information retrieval task 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 19
Task 7.5: Evaluation Testbeds Starts 1/2005 Focus on creation of testbeds for usageoriented evaluation Testbed framework with extensible services 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 20
Summary WP7 will enable communication between evaluation experts and DL researchers/developers T7.1: Evaluation Forum continue existing evaluation initiatives relevant for the DL area T7.3: INEX. T7.4: CLEF develop new evaluation models, methods and testbeds T7.2: Evaluation models and methods T7.5: Evaluation testbed 15/09/2004 C.-P. Klas: DELOS Evaluation Cluster 21