Metadata Harvesting Framework

Similar documents
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0. Hussein Suleman Virginia Tech DLRL 17 June 2002

Tutorial. Open Archive Initiative

Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives

Using metadata for interoperability. CS 431 February 28, 2007 Carl Lagoze Cornell University

OAI-PMH. DRTC Indian Statistical Institute Bangalore

Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

OAI-PMH repositories: Quality issues regarding metadata and protocol compliance

OAI-PMH implementation and tools guidelines

Version 2 of the OAI-PMH & some other stuff

The Open Archives Initiative Protocol for Metadata Harvesting: An Introduction

Network Information System. NESCent Dryad Subcontract (Year 1) Metacat OAI-PMH Project Plan 25 February Mark Servilla

Publishing Based on Data Provider

The Open Archives Initiative Protocol for Metadata Harvesting

IMu OAI-PMH Web Service

Metadata aggregation for digital libraries

Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial

IVOA Registry Interfaces Version 0.1

The multi-faceted use of the OAI-PMH in the LANL Repository


OAI Static Repositories (work area F)

Integrating Access to Digital Content

The Open Archives Initiative and the Sheet Music Consortium

Harvesting Metadata Using OAI-PMH

RVOT: A Tool For Making Collections OAI-PMH Compliant

Outline of the course

An introduction to OAI-PMH

Interoperability and Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Creating a National Federation of Archives using OAI-PMH

Open Archives Initiative protocol development and implementation at arxiv

CodeSharing: a simple API for disseminating our TEI encoding. Martin Holmes

Harvesting Statistical Metadata from an Online Repository for Data Analysis and Visualization

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Digital Library Curriculum Development Module 5-d: Protocols (Last Updated: )

Joining the BRICKS Network - A Piece of Cake

Metadata and Encoding Standards for Digital Initiatives: An Introduction

How to contribute information to AGRIS

Corso di Biblioteche Digitali

Comparing Open Source Digital Library Software

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

Applying SOAP to OAI-PMH

Orbis Cascade Alliance Content Creation & Dissemination Program Digital Collections Service. Enabling OAI & Mapping Fields in Digital Commons

2nd Technical Validation Questionnaire - interim results -

The Observation of Bahasa Indonesia Official Computer Terms Implementation in Scientific Publication

NEEO TECHNICAL GUIDELINES FOR THE

Harvester Service Technical and User Guide 5 June 2008

Design of The PORTA EUROPA Portal (PEP) Pilot Project

A Novel Architecture of Agent based Crawling for OAI Resources

arxiv, the OAI, and peer review

Metadata Standards and Applications

CARARE Training Workshops

Indonesian Citation Based Harvester System

EXTENDING OAI-PMH PROTOCOL WITH DYNAMIC SETS DEFINITIONS USING CQL LANGUAGE

Flexible Design for Simple Digital Library Tools and Services

Expected and Unexpected Synergies

adore: a modular, standards-based Digital Object Repository

Research on the Interoperability Architecture of the Digital Library Grid

Increasing access to OA material through metadata aggregation

SDMX self-learning package XML based technologies used in SDMX-IT TEST

Publications Repository Based on OAI-PMH 2.0 Using Google App Engine

Metadata: The Theory Behind the Practice

OAI AND AMF FOR ACADEMIC SELF-DOCUMENTATION

mod_oai: An Apache Module for Metadata Harvesting

The NSDL Repository and API

Metadata Workshop 3 March 2006 Part 1

A methodology for Sharing Archival Descriptive Metadata in a Distributed Environment

Guidelines for Developing Digital Cultural Collections

Taking D2D Services to the Users with OpenURL, RSS, and OAI-PMH. Chuck Koscher Technology Director, CrossRef

oatd.org Discovery for Open Access Theses and Dissertations An ASERL Webinar, October 15, 2013 These slides:

Chuck Cartledge, PhD. 25 February 2018

Questionnaire for effective exchange of metadata current status of publishing houses

The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data

ORCA-Registry v2.4.1 Documentation

Overview NSDL Collection Representation and Information Flow

Interoperability for Digital Libraries

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

SMART CONNECTOR TECHNOLOGY FOR FEDERATED SEARCH

Open Archives Initiative Object Reuse & Exchange. Resource Map Discovery

Open Archives Forum - Technical Validation -

Developing an Institutional Repository Service in Chinese Academy of Sciences

How to Create a Custom Ingest Form

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

INTRO INTO WORKING WITH MINT

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Using the WorldCat Digital Collection Gateway

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Cross-domain Metadata Interoperability for Integrated Information Services

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Harvesting of Additional Metadata Schema into DSpace through OAI-PMH: Issues and Challenges

MuseKnowledge Hybrid Search

Repository Interoperability

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

Digital Libraries: Interoperability

RDF and Digital Libraries

OAI (Open Archives Initiative) Suite Version 3.0. Introductory Guide for New Users

Go Sugimoto, Kerstin Arnold, Wim van Dongen, Yoann Moranville Reviewer: Lucile Grand

Package rdryad. June 18, 2018

Lessons Learned in Implementing the Extended Date/Time Format in a Large Digital Library

Transcription:

Metadata Harvesting Framework Library User 3. Provide searching, browsing, and other services over the data. Service Provider (TEL, NSDL) Harvested Records 1. Service Provider polls periodically for new records OAI protocol (over http) 2. New records downloaded and cached by the Service Provider Data Providers: (collection builders) OAI workshop, December 11, 2006 35 Multiple representations of an object MARC Record In XML Dublin Core Record In XML Qualified Dublin Core Record In XML MODS record In XML Honoré Daumier Lithograph (Brandeis University) OAI workshop, December 11, 2006 36 18

HTTP and XML The OAI-PMH is an almost stateless request/response protocol Requests and responses are sent using the HTTP protocol Requests are made using HTTP GET/POST operations Responses are returned as well-formed, valid XML documents OAI workshop, December 11, 2006 37 Well-formed and Valid XML Correct <car> <make>dodge</make> <model>spirit</model> <year>1994</year> <owner> <name>you</name> <plate>co</plate> </owner> </car> Incorrect <car> <make>dodge</make> <model>spirit</model> <year>1994 <owner> <plate>co</plate> <name>you</name> </car> </owner> OAI workshop, December 11, 2006 38 19

DTD, Schemas & Namespace DTD s: Document Type Definition Describe the elements of XML instance documents Not well-formed XML Some data-typing Namespaces harder to deal with Namespace: Schemas Describe the elements of XML instance documents Well-formed XML Strong data-typing Namespaces are easier to deal with Collection of related element names identified by a name label (e.g. dc) OAI workshop, December 11, 2006 39 XML Namespaces and Schema Consistency and data quality is ensured through XML schemas and schema validation Two separate XML namespaces are used: One that defines the OAI-PMH response Another that defines the metadata records contained in the response e.g. the record-level schema Example: http://www.dlese.org/oai/provider?verb=getrecord&metadataprefix=adn&id entifier=oai:dlese.org:dlese-000-000-000-690 OAI workshop, December 11, 2006 40 20

OAI repositories can be organized in sets OAI-PMH mechanism to allow for harvesting of subcollections Semantics for sets are defined outside of the protocol Sets are defined by conventions established between data and service providers, or just by the data provider Sets can be established that enable querying (e.g. by topic, author name, subject area, etc.) Example: The Open Digital Library (Suleman, 2001) OAI workshop, December 11, 2006 41 OAI repositories can be organized in sets What do sets represent? Journals: issues Institutional repositories: Departments, research centers, etc. Set representations may be constrained by the software package used. EPrint Archives: Subject, Publication Status Cultural Heritage Repositories: Collections with Intent 5 April, 2006 OAI workshop, December 11, 2006 42 21

Requirements to be a Data Provider Source of metadata Human or automated resource catalogers Metadata mappings Crosswalks from native formats to DC or other formats Server technology Handled by the OAI software Datestamps Indicates when the item was last changed (handled by the OAI software) Deletions Indicates if the item has been deleted and should be removed (handled by the OAI software) Unique identifiers Used to uniquely identify each item across repositories OAI workshop, December 11, 2006 43 Examples of repositories OAForum Information Resource Database is no longer active Refer to UKOLN site: http://www.ukoln.ac.uk/repositories/digirep/index/faqs More repositories at: http://www.openarchives.org/register/browsesites OAI workshop, December 11, 2006 44 22

Examples of services http://oaister.umdl.umich.edu http://www.theeuropeanlibrary.org http://cicharvest.grainger.uiuc.edu/ http://www.americansouth.org/ http://nsdl.org/ http://www.pictureaustralia.org/ http://imlsdcc.grainger.uiuc.edu/ http://www.language-archives.org/ OAI workshop, December 11, 2006 45 The OAI-PMH OAI-PMH Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Resumption Tokens Used for flow control when large responses are required OAI workshop, December 11, 2006 46 23

OAI-PMH: overview and structure model OAI workshop, December 11, 2006 47 Key Definitions Harvester: client application issuing OAI-PMH requests Repository: network accessible server, able to process OAI- PMH requests correctly Set: optional construct for grouping items in a repository OAI workshop, December 11, 2006 48 24

Key Definitions Resource: object the metadata is "about", nature of resources is not defined in the OAI- PMH resources may be digital or non-digital Item: component of an repository from which metadata about a resource can be disseminated; has an unique identifier Record: metadata in a specific metadata format Identifier: unique key for an item in a repository OAI workshop, December 11, 2006 49 Protocol Details: Records A record is the metadata of a resource in a specific format. A record has three parts: a header and metadata, both of which are mandatory, and an optional about statement. Each of these is made up of various components as set out below. header (mandatory) - identifier (mandatory: 1 only) - datestamp (mandatory: 1 only) - setspec elements (optional: 0, 1 or more) - status attribute for deleted item metadata (mandatory) - XML encoded metadata with root tag, namespace - repositories must support Dublin Core, may support other formats about (optional) - rights statements - provenance statements OAI workshop, December 11, 2006 50 25

Protocol Details: Datestamps A datestamp is the date of last modification of a metadata record. Datestamp is a mandatory characteristic of every item. It has two possible levels of granularity: YYYY-MM-DD YYYY-MM-DDThh:mm:ssZ. The function of the datestamp is to provide information on metadata that enables selective harvesting using from and until arguments. Its applications are in incremental update mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is covered with three support levels: no persistent transient. OAI workshop, December 11, 2006 51 Protocol Details: Metadata schema OAI-PMH supports dissemination of multiple metadata formats from a repository. The properties of metadata formats are: id string to specify the format (metadataprefix) metadata schema URL (XML schema to test validity) XML namespace URI (global identifier for metadata format) Repositories must be able to disseminate unqualified Dublin Core. The Dublin Core Metadata Element Set contains 15 elements. All elements are optional, and all elements may be repeated. Further arbitrary metadata formats can be defined and transported via the OAI-PMH. Any returned metadata must comply with an XML namespace specification. OAI workshop, December 11, 2006 52 26

Protocol Details: Sets Sets enable a logical partitioning of repositories. They are optional - archives do not have to define Sets. There are no recommendations for the implementation of Sets. Sets are not necessarily exhaustive of the content of a repository. They are not necessarily strictly hierarchical. It is important and necessary to have negotiated agreements within communities defining useful sets for the communities. function: selective harvesting (set parameter) applications: subject gateways, dissertation search engine, and others examples publication types (thesis, article,?) document types (text, audio, image,?) content sets, according to DNB (medicine, biology,?) OAI workshop, December 11, 2006 53 Protocol Details: Request format Requests must be submitted using the GET or POST methods of HTTP, and repositories must support both methods. At least one key=value pair: verb=requesttype (where RequestType is some type of request such as ListRecords) must be provided. Additional key=value pairs depend on the request type. example for GET request: http://archive.org/oai?verb=listrecords&metadataprefi x=oai_dc The encoding of special characters must be supported; for example, ":" (host port separator) becomes "%3A" OAI workshop, December 11, 2006 54 27

Protocol Details: Response Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not available) may be returned. Compression codes are optional in OAI-PMH, only identity encoding is mandatory. The response format must be well-formed XML with markup as follows: 1. XML declaration (<?xml version="1.0" encoding="utf-8"?>) 2. root element named OAI-PMH with three attributes (xmlns, xmlns:xsi, xsi:schemalocation) 3. three child elements 1. responsedate (UTC datetime) 2. request (the request that generated this response) 3. a) error (in case of an error or exception condition) b) element with the name of the OAI-PMH request OAI workshop, December 11, 2006 55 Protocol Details: Flow control Four of the request types return a list of entries. Three of them may reply with 'large' lists. OAI-PMH supports partitioning. Those managing a repository make the decisions on partitioning: whether to partition and how. The response to a request includes: incomplete list resumption token expiration date, size of complete list, cursor (optional) For a new request with same request type: resumption token as parameter all other parameters omitted! The response includes the next (which may be the last) section of the list and a resumption token. That resumption token is empty if the last section of the list is enclosed. OAI workshop, December 11, 2006 56 28

Protocol Details: Flow control OAI workshop, December 11, 2006 57 Protocol Details: Errors and exceptions Repositories must indicate OAI-PMH errors by the inclusion of one or more error elements. The defined error identifiers are: badargument badresumptiontoken badverb cannotdisseminateformat iddoesnotexist norecordsmatch nometadataformats nosethierarchy OAI workshop, December 11, 2006 58 29

Request types There are six different request types: Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord A harvester is not required to use all types. A repository must implement all types. There are required and optional arguments, depending on request types. OAI workshop, December 11, 2006 59 Request types: Identify function description of an archive example archive.org/oai-script?verb=identify parameters none errors / exceptions badargument (e.g. archive.org/oaiscript?verb=identify&set=biology) response format OAI workshop, December 11, 2006 60 30

Request types: Identify Response format Element Example repositoryname My Archive baseurl http://archive.org/oai protocolversion 2.0 earliestdatestamp 1999-01-01 deleterecords no, transient, persistent granularity YYY-MM-DD, YYYY-MM-DDThh:mm:ssZ adminemail oai-admin@archive.org compression deflate, compress description oai-identifier, eprints, friends, Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more Ordinality 1 1 1 1 1 1 + * * Online example: http://www.dlese.org/oai/provider?verb=identify OAI workshop, December 11, 2006 61 Request types: ListMetadataFormats function retrieve available metadata formats from archive example archive.org/oai-script?verb=listmetadataformats& identifier=oai:huberlin.de:3000218 parameters identifier (optional) errors / exceptions badargument iddoesnotexist e.g. archive.org/oai-script?verb=listmetadataformats &identifier=really-wrong-identifier nometadataformats Online examples http://www.dlese.org/oai/provider?verb=listmetadataformats http://oai.bn.pt/servlet/oaihandler?verb=listmetadataformats&identifier=oai:oai.bn.pt:cienciasartes/2108 OAI workshop, December 11, 2006 62 31

Request types: ListSets function retrieve set structure of a repository example archive.org/oai-script?verb=listsets parameters resumptiontoken (exclusive) errors / exceptions badargument badresumptiontoken e.g. archive.org/oai-script?verb=listsets &resumptiontoken=any-wrong-token nosethierarchy Online examples http://www.dlese.org/oai/provider?verb=listsets http://oai.bn.pt/servlet/oaihandler?verb=listsets OAI workshop, December 11, 2006 63 Request types: ListIdentifiers function abbreviated form of ListRecords, retrieving only headers example archive.org/oai-script?verb=listidentifiers& metadataprefix=oai_dc&from=2002-12-01 parameters from (optional) until (optional) metadataprefix (required) set (optional) resumptiontoken (exclusive) errors / exceptions badargument (e.g.?&from=2002-12-01-13:45:00) badresumptiontoken cannotdisseminateformat norecordsmatch nosethierarchy online example http://www.dlese.org/oai/provider?verb=listidentifiers&metadataprefix=adn OAI workshop, December 11, 2006 64 32

Request types: ListRecords function harvest records from a repository example archive.org/oai-script?verb=listrecords& metadataprefix=oai_dc&set=biology parameters from (optional) until (optional) metadataprefix (required) set (optional) resumptiontoken (exclusive) errors / exceptions badargument badresumptiontoken cannotdisseminateformat norecordsmatch nosethierarchy Online example http://www.dlese.org/oai/provider?verb=listrecords&metadataprefix=oai_dc http://www.dlese.org/oai/provider?verb=listrecords&metadataprefix=adn http://oai.bn.pt/servlet/oaihandler?verb=listrecords&from=2006-01-01&until=2006-01- 30&set=bnd&metadataPrefix=tel OAI workshop, December 11, 2006 65 Request types: GetRecord function retrieve individual metadata record from a repository example archive.org/oai-script?verb=getrecord& identifier=oai:huberlin.de:3000218& metadataprefix=oai_dc parameters identifier (required) metadataprefix (required) errors / exceptions badargument cannotdisseminateformat iddoesnotexist online examples http://oai.bn.pt/servlet/oaihandler?verb=getrecord&identifier=oai:oai.bn.pt:bnd/porbase619&metadataprefix=tel http://www.dlese.org/oai/provider?verb=getrecord&identifier=oai%3adlese.org%3adlese-000-000-000-002&metadataprefix=adn OAI workshop, December 11, 2006 66 33

Turn key systems and modules CWIS : http://scout.wisc.edu/projects/cwis/ ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/ DLXS: http://www.dlxs.org/ OAICat: http://www.oclc.org/research/software/oai/cat.htm XMLFile: http://www.dlib.vt.edu/projects/oai/software/xmlfile/xmlfile.html DLESE OAI software: http://dlese.org/oai/index.jsp More tools at: http://www.openarchives.org/tools/tools.html OAI workshop, December 11, 2006 67 References 1. Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives, Hussein Suleman, JCDL 2001 Tutorial. 2. A Framework for Building Open Digital Libraries, Hussein Suleman and Edward A. Fox, in D-Lib Magazine, December, 2001. http://www.dlib.org/dlib/december01/suleman/12suleman.html 3. The Open Archives Initiative http://www.openarchives.org 4. DLF/NSDL best practices for OAI and shareable metadata http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?tableofcontents 5. Open Archives Forum http://www.oaforum.org OAI workshop, December 11, 2006 68 34