The MIND Approach. Fabio Crestani University of Strathclyde, Glasgow, UK. Open Archive Forum Workshop Berlin, Germany, March 2003

Similar documents

Joining the BRICKS Network - A Piece of Cake

Implementing OAI Data and Service Providers

2nd Technical Validation Questionnaire - interim results -

Semantic Web Update W3C RDF, OWL Standards, Development and Applications. Dave Beckett

The Open Archives Initiative and the Sheet Music Consortium

Building an OAI-based Union Catalog for the National Digital Archives Program in Taiwan

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

RDF and Digital Libraries

Torii Access the Digital Research Community

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

The Semantic Web DEFINITIONS & APPLICATIONS

Architecture domain. Leonardo Candela. DL.org Autumn School Athens, 3-8 October th October 2010

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

Increasing access to OA material through metadata aggregation

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

The Design of a DLS for the Management of Very Large Collections of Archival Objects

A Dublin Core Application Profile in the Agricultural Domain

Opus: University of Bath Online Publication Store

Ontology Servers and Metadata Vocabulary Repositories

The OAIS Reference Model: current implementations

Report from the W3C Semantic Web Best Practices Working Group

Union catalogue models

Data Exchange and Conversion Utilities and Tools (DExT)

Networking European Digital Repositories

Digital Libraries: Interoperability

Building Illinois Electronic Documents Access

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

The DART-Europe E-theses Portal

Metadata Overview: digital repositories

Bringing the Voices of Communities Together:

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

Comparison and mapping of VRA Core and IMS Meta-data

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

Networking European Digital Repositories

Share.TEC Repository System

Metadata and Encoding Standards for Digital Initiatives: An Introduction


Metadata Workshop 3 March 2006 Part 1

RDDS: Metadata Development

Digital repositories as research infrastructure: a UK perspective

Software Requirements Specification for the Names project prototype

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

Persistent identifiers, long-term access and the DiVA preservation strategy

INTRO INTO WORKING WITH MINT

Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives

How to Create a Custom Ingest Form

SMART CONNECTOR TECHNOLOGY FOR FEDERATED SEARCH

Preserving Digital Cultural Heritage: the PREFORMA Project. Antonella Fresa Promoter Srl PREFORMA Technical Coordinator

Introduction to the OAI Protocol for Metadata Harvesting Version 2.0. Hussein Suleman Virginia Tech DLRL 17 June 2002

International Audit and Certification of Digital Repositories

Networking European Digital Repositories

Open Archives Forum - Technical Validation -

OAI-PMH. DRTC Indian Statistical Institute Bangalore

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Simple Archive Architectures

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Show me the data. The pilot UK Research Data Registry. 26 February 2014

UKOLN involvement in the ARCO Project. Manjula Patel UKOLN, University of Bath

Universal Profiling for Content Negotiation and Adaptation

Harvesting RDF triples

Non-text theses as an integrated part of the University Repository

Integrating TEI and EAD

Jisc Research Data Discovery Service Project Workshop Christopher Brown

MuseKnowledge Hybrid Search

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

The Necessity of a New Culture of Electronic Publishing C A S L I N

RVOT: A Tool For Making Collections OAI-PMH Compliant

Introduction to Omeka.net

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization

Design of The PORTA EUROPA Portal (PEP) Pilot Project

Metadata: The Theory Behind the Practice

A Comparative Study of the Search and Retrieval Features of OAI Harvesting Services

Distributed Services Architecture in dlibra Digital Library Framework

Interoperability for Digital Libraries

Nuno Freire National Library of Portugal Lisbon, Portugal

Europeana DSI 2 Access to Digital Resources of European Heritage

WEB-BASED COLLECTION MANAGEMENT FOR ARCHIVES

How to Edit/Create Ingest Forms

Michel Drescher, FLE, Ltd. Standardised Namespaces for XML in GGF (draft 09) N/A

SQA Advanced Unit Specification: general information

Participant User Guide, Version 2.6

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

Pandektis: Implementing a repository of greek historical and cultural material with DSpace

Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives

CERN Document Server Software

BRA BIHAR UNIVERSITY, MUZAFFARPUR DIRECTORATE OF DISTANCE EDUCATION

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

Event-Based Modeling and Processing of Digital Media

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Metadata Management System (MMS)

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

<Insert Picture Here> Click to edit Master title style

arxiv, the OAI, and peer review

Enhancing Digital Library Documents by A Posteriori Cross Linking Using XSLT

Transcription:

The MIND Approach Fabio Crestani University of Strathclyde, Glasgow, UK Open Archive Forum Workshop Berlin, Germany, March 2003

Outline Project organisation Motivations, assumptions and main issues Architecture Searching distributed multimedia DLs with MIND MIND components of interest to OA Forum: Resource description acquisition Schema mapping 2

Project Organisation MIND: Resource Selection and Data Fusion in Multimedia Distributed Digital Libraries IST-RTD FP5 project Duration: January 2001 - June 2003 (30 months) Project participants: University of Strathclyde (UK) (coordinator) University of Dortmund (Germany) University of Florence (Italy) University of Sheffield (UK) Carnegie Mellon University (USA) More info at: http://www.mind-project.net/ 3

Motivations Goal: enable searching hundreds of DLs using one distributed content-based access system heterogeneity (in query language, schema, etc.) multimedia (text, images, speech) Assumptions: minimal level of standardisation and cooperation cooperative and non-cooperative DLs simplest possible query interface (Google style) user interested in high precision searches 4

Main Issues Content-based access to information No local repository! Main issues: resource descriptions resource selection schema mapping data fusion presentation of results and user interfaces Different levels of success in dealing with these issues in the project 5

MIND Architecture Design goals: Distributed environment, different languages/os Modification/extension can be done easily New DLs New media types New functionalities Solutions: Specific components for different parts, e.g. Proxy component for DL Media-specific components Communication via SOAP (XML, HTTP) 6

MIND Architecture User Interface User Interface Data fuser Data fuser Dispatcher Text Fact Proxy 1 Text Proxy 2 Text Proxy 3 Fact Image Image Wrapper 1 Wrapper 2 Wrapper 3 Text Proxy 4 Fact Speech Wrapper 4 DL 1 DL 2 DL 3 DL 4 7

Integration User Interface User Interface User Interface USFD DSI USG Data fuser Data fuser Data fuser USFD DSI USG Dispatcher UNIDO Media Media Proxy Resource description creation for images DSI CMU QBS Media Wrapper UNIDO Learner for schema rules parameters UNIDO DL 8

Searching with MIND User query Modified user query Prop. query Prop. query Prop. query Prop. query Query modification Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 9

Query Modification User query Query modification Modified user query Prop. query Prop. query Prop. query Prop. query Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 10

Query Modification Tasks: capture user information need add more information about the user query expansion w.r.t. relevance feedback data Actions: Interface: captures information need in a multimedia query Dispatcher: adds new conditions/modifies conditions and weights w.r.t. relevance feedback features Proxies: generate DL-specific features 11

Query Transformation User query Query modification Modified user query Prop. query Prop. query Prop. query Prop. query Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 12

Why Query Transformation? Motivations: Heterogeneous schemas Thus: (uncertain) mapping between schemas to transform user query into proprietary query Example: Dublin Core RFC 1807 DC TITLE CREATOR DESCRIPTION PUBLISHER DATE TYPE FORMAT IDENTIFIER RIGHTS RFC1807 BIB_VERSION ID ENTRY ORGANIZATION TITLE TYPE REVISION AUTHOR CONTACT DATE COPYRIGHT ABSTRACT 13

Query Transformation Task: transform query w.r.t. different schemas Actions: Proxies: transforms query condition by condition title identifier creator user query title identifier creator 245 856 100 700 710 245 856 100 700 710 prop. query More about this later 14

Resource Selection User query Query modification Modified user query Prop. query Prop. query Prop. query Prop. query Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 15

Resource Selection Task: find DLs that are relevant to the query use decision-theoretic model: use resource descriptions cost factors (e.g. monetary costs, computation and communication time, retrieval quality) Actions: Dispatcher: calculates for every DL the number of documents to retrieve so that overall expected costs are minimized and retrieval quality is maximised Proxies: calculate specific costs for retrieval 16

Query Run User query Modified user query Prop. query Prop. query Prop. query Prop. query Query modification Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 17

Data Fusion User query Query modification Modified user query Prop. query Prop. query Prop. query Prop. query Query transformation Resource selection Prop. query Prop. query Database query run Result list Result list Fused result list Data fusion 18

Data Fusion Task: fuse together results from different DLs Actions: Dispatcher: detects and eliminates duplicate documents (ID or content-based); modifies document weights to improve retrieval quality using global information from resource selection process Data fuser: merges result lists using local information Interface: creates summaries or surrogates; presents results 19

MIND and OAI Why should MIND be of some interest to OA Forum? Completely different approach: assumes no cooperation from DLs content-based retrieval no local repository of harvested metadata Lessons on: resource description acquisition for content-based retrieval (multimedia query-based sampling) schema mapping 20

Task: Creating Resource Descriptions create and update resource descriptions Actions: Proxies: start resource gathering at self-defined times uses query-based sampling Resource descriptions are used in almost all phases of query processing 21

Query-base sampling Technique developed at CMU How does it work: 1. iterative retrieval of documents using random queries 2. assumption: union of results is representative for whole collection 3. extract resource description w.r.t. document sample We have extended QBS to multimedia DLs resource descriptions for images and speech 22

Schema mapping Heterogeneous DLs with different schemas require schema mapping MIND uses Probabilistic Datalog (UNIDO) Schema mapping is carried out at document and query level schema mapping at document level is necessary for relevance feedback Handling: queries/documents encoded in RDF/XML transform rules into XSLT 23

Creating schema mapping rules Generating rules from the schema find indicators for matching attributes E.g. attribute names, datatypes (equality, sub-datatype) compute probability for each attribute pair (Probabilistic Datalog), taking most likely candidates Rules created automatically, but still possible to modifying them manually significant error rate 24

Schema mapping at document level Documents: RFC 1807 date author (RFC1807,date, 01 ) (DC,date, 2001 ) Dublin Core (standard schema) date creator MARC 21 033 100 700 710 25

Schema mapping at query level Queries: Dublin Core (standard schema) creator/soundex (DC,creator,soundex) (MARC21,100,sounds-like) with probability 0.6 MARC 21 100/sounds-like 700/sounds-like 100/sounds-like 26

Creating schema mapping rules Standard schema Query rules Built-in predicates DL schema Document rules Rule 11-Feb-03 2003-02-11 27

Conclusions MIND and OAI are very different in their assumptions about data, users, kind of searches, etc. Are MIND and OAI different solutions to the same problem? MIND tries to deal/live with chaos OAI tries to bring order in the chaos 28