Version 2 of the OAI-PMH & some other stuff

Similar documents
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0. Hussein Suleman Virginia Tech DLRL 17 June 2002

Using metadata for interoperability. CS 431 February 28, 2007 Carl Lagoze Cornell University

Metadata Harvesting Framework

OAI-PMH repositories: Quality issues regarding metadata and protocol compliance

arxiv, the OAI, and peer review

OAI-PMH. DRTC Indian Statistical Institute Bangalore

RVOT: A Tool For Making Collections OAI-PMH Compliant

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

Publishing Based on Data Provider

Open Archives Initiative protocol development and implementation at arxiv

Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives

Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives

The Open Archives Initiative Protocol for Metadata Harvesting

Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial

Tutorial. Open Archive Initiative

OAI Static Repositories (work area F)

Interoperability and Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

OAI-PMH implementation and tools guidelines

The Open Archives Initiative Protocol for Metadata Harvesting: An Introduction

CodeSharing: a simple API for disseminating our TEI encoding. Martin Holmes

Open Archives Initiative Object Reuse and Exchange Technical Committee Meeting, May 29, Edited by: Carl Lagoze & Herbert Van de Sompel

Harvesting Metadata Using OAI-PMH


A Novel Architecture of Agent based Crawling for OAI Resources

IMu OAI-PMH Web Service

The Open Archives Initiative and the Sheet Music Consortium

Integrating Access to Digital Content

An introduction to OAI-PMH

Creating a National Federation of Archives using OAI-PMH

IVOA Registry Interfaces Version 0.1

Open Archives Initiative Object Reuse & Exchange. Resource Map Discovery

Metadata aggregation for digital libraries

The multi-faceted use of the OAI-PMH in the LANL Repository

Applying SOAP to OAI-PMH

Increasing access to OA material through metadata aggregation

Better interoperability through the Open Archives Initiative

Interoperability, Metadata, and Complex Objects

Network Information System. NESCent Dryad Subcontract (Year 1) Metacat OAI-PMH Project Plan 25 February Mark Servilla

Digital Library Curriculum Development Module 5-d: Protocols (Last Updated: )

CORE: Improving access and enabling re-use of open access content using aggregations

Open Archives Initiative Object Reuse & Exchange. Resource Map Discovery

The OAIS Reference Model: current implementations

Harvesting Statistical Metadata from an Online Repository for Data Analysis and Visualization

ResourceSync - An Introduction

Lessons Learned with Arc, an OAI-PMH Service Provider

Chuck Cartledge, PhD. 25 February 2018

Joining the BRICKS Network - A Piece of Cake

Outline of the course

Orbis Cascade Alliance Content Creation & Dissemination Program Digital Collections Service. Enabling OAI & Mapping Fields in Digital Commons

mod_oai: An Apache Module for Metadata Harvesting

Harvester Service Technical and User Guide 5 June 2008

ORCA-Registry v2.4.1 Documentation

A Dublin Core Application Profile for Scholarly Works (eprints)

Search Interoperability, OAI, and Metadata

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

OAI-ORE. A non-technical introduction to: (

An overview of the OAIS and Representation Information

Comparing Open Source Digital Library Software

Initial Experiences Re-exporting Duplicate and Similarity Computations with an OAI-PMH Aggregator

Expected and Unexpected Synergies

Flexible Design for Simple Digital Library Tools and Services

Archon - A Digital Library that Federates Physics Collections

EPrints: Repositories for Grassroots Preservation. Les Carr,

An Architecture to Share Metadata among Geographically Distributed Archives

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

Phase 1 RDRDS Metadata

A Case Study in Metadata Harvesting: the NSDL

adore: a modular, standards-based Digital Object Repository

OAI-Publishers in Repository Infrastructures

NEEO TECHNICAL GUIDELINES FOR THE

Package rdryad. June 18, 2018

Open Archive Solutions to Traditional Archive/Library Cooperation By DONATELLA CASTELLI

Digital Objects, Data Models, and Surrogates. Carl Lagoze Computing and Information Science Cornell University

EXTENDING OAI-PMH PROTOCOL WITH DYNAMIC SETS DEFINITIONS USING CQL LANGUAGE

Current JISC initiatives for Repositories

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

BIBLID (2004) 93:1 pp (2004.6) 209. NBINet NBINet 92

Building Virtual Collections

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

Corso di Biblioteche Digitali

DLF Update on Metadata Services

RN Workshop Series on Innovations in Scholarly Communication: plementing the Benefits of OAI (OAI3)

Developing an Institutional Repository Service in Chinese Academy of Sciences

Exposing Cross-Domain Resources for Researchers and Learners

Networking European Digital Repositories

ResourceSync Towards a Web-Based Approach for Resource Synchronization

Networking European Digital Repositories

Engaging and Connecting Faculty:

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

OAI AND AMF FOR ACADEMIC SELF-DOCUMENTATION

How to contribute information to AGRIS

The Scottish Collections Network: landscaping the Scottish common information environment. Gordon Dunsire

Repository Interoperability

Taking D2D Services to the Users with OpenURL, RSS, and OAI-PMH. Chuck Koscher Technology Director, CrossRef

The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data

Digitisation Standards

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

GNU EPrints 2 Overview

Publications Repository Based on OAI-PMH 2.0 Using Google App Engine

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

Transcription:

Version 2 of the OAI-PMH & some other stuff 2 nd Workshop on the OAI, CERN Geneva, October 17 th 2002 Herbert Van de Sompel Los Alamos National Laboratory Carl Lagoze Cornell University

about OAI-PMH v.2.0 measures of success future?

releasing OAI-PMH v.2.0

creation of OAI-tech revision phase alpha testing phase beta phase release of OAI-PMH v.2.0

charge: US representatives creation of OAI-tech (06/01) review functionality and nature of OAI-PMH v.1.0 investigate extensions release stable version of OAI-PMH by 05/02 Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Mohammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton)

revision phase [09/01 02/02] review process by OAI-tech [09/01 01/02] identification of issues discussion of issues proposals for resolution by OAI Exec drafting of revised protocol document [02/02] Lagoze, Van de Sompel, Nelson, Warner

alpha testing phase [03/02 05/02] extension of OAI-tech with alpha testers continuous feedback from their implementations ongoing revision of protocol document

OAI-PMH 2.0 alpha testers The British Library Cornell U. -- NSDL project & e-print arxiv Ex Libris FS Consulting Inc -- harvester for my.oai Humboldt-Universität zu Berlin InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC Old Dominion U. -- ARC, DP9 U. of Illinois at Urbana-Champaign U. Of Southampton -- OAIA, CiteBase, eprints.org UCLA, John Hopkins U., Indiana U., NYU UKOLN, U. of Bath RDN Virginia Tech -- repository explorer

beta phase [05/02-06/02] beta release on May 1st 2002 to: registered data providers and service providers interested parties general public fine tuning of protocol document preparation for the release of 2.0 conformant tools by alpha testers release June 14th 2002

what s new in OAI-PMH v.2.0

quick recap general changes to improve solidity of protocol corrections new functionality

overview of OAI Verbs metadata about the repository harvesting verbs Verb Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Function description of repository metadata formats supported by repository sets defined by repository OAI unique ids contained in repository listing of N records listing of a single record most verbs take arguments: datestamps, sets, ids, metadata formats and resumption token (for flow control)

general changes

protocol vs periphery clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines

OAI-PMH vs HTTP clear separation of OAI-PMH and HTTP OAI-PMH error handling all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badverb) http codes 302, 503, etc. still available to implementers, but no longer represent OAI-PMH events

resource item - record set-membership is item-level property resource item ~ identifier all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records record ~ identifier + metadata format + datestamp

resource about item oai:ab.org:1234 identifier O oai_dc xxx metadataprefix A I metadata records datestamp1 oai_dc metadata datestamp2 xxx metadata datestamp

other general changes better definitions of harvester, repository, item, unique identifier, record, set, selective harvesting oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression

other general changes all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling

<?xml version="1.0" encoding="utf-8"?> <OAI-PMH> <responsedate>2002-0208t08:55:46z</responsedate> <request verb= GetRecord >http://arxiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arxiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setspec>cs</setspec> <setspec>math</setspec> </header> <metadata>.. </metadata> </record> </GetRecord> </OAI-PMH> response no errors no URL encoding of the OAI-PMH request

response with error <?xml version="1.0" encoding="utf-8"?> <OAI-PMH> <responsedate>2002-0208t08:55:46z</responsedate> <request>http://arxiv.org/oai2</request> <error code= badverb >ShowMe is not a valid OAI-PMH verb</error> </OAI-PMH> with errors, only the correct attributes are echoed in <request>

corrections

dates/times all dates/times are UTC, encoded in ISO8601, Z-notation 1957-03-20T20:30:00Z

resumptiontoken idempotency of resumptiontoken: return same incomplete list when rt is reissued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new, optional attributes for the resumptiontoken: expirationdate completelistsize cursor

norecordsmatch 1.x - if no records match, an empty list was returned

norecordsmatch 2.0 - if no records match, the exception condition norecordsmatch is returned -- not an empty list

new functionality

harvesting granularity harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ granularity of from and until must be the same

Identify Identify more expressive <Identify> <repositoryname>library of Congress 1</repositoryName> <baseurl>http://memory.loc.gov/cgi-bin/oai</baseurl> <protocolversion>2.0</protocolversion> <adminemail>joe@here.org</adminemail> <adminemail>jane@there.edu</adminemail> <deletedrecord>transient</deletedrecord> <earliestdatestamp>1990-02-01t00:00:00z</earliestdatestamp> <granularity>yyyy-mm-ddthh:mm:ssz</granularity> <compression>deflate</compression>

header header contains set membership of item <record> <header> <identifier>oai:arxiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setspec>cs</setspec> <setspec>math</setspec> </header> <metadata>.. </metadata> </record> eliminates the need for the double harvest 1.x required to get all records and all set information

ListIdentifiers ListIdentifiers returns headers <?xml version="1.0" encoding="utf-8"?> <OAI-PMH> <responsedate>2002-0208t08:55:46z</responsedate> <request verb= >http://arxiv.org/oai2</request> <ListIdentifiers> <header> <identifier>oai:arxiv:hep-th/9801001</identifier> <datestamp>1999-02-23</datestamp> <setspec>physic:hep</setspec> </header> <header> <identifier>oai:arxiv:hep-th/9801002</identifier> <datestamp>1999-03-20</datestamp> <setspec>physic:hep</setspec> <setspec>physic:exp</setspec> </header>

ListIdentifiers ListIdentifiers mandates metadataprefix as argument http://www.perseus.tufts.edu/cgi-bin/pdataprov? verb=listidentifiers &metadataprefix=olac &from=2001-01-01 &until=2001-01-01 &set=perseus:collection:persinfo

ListIdentifiers the changes to ListIdentifiers are subtle, and reflect a change in the OAI-PMH data model Could have been named ListHeaders or reduced to an option for ListRecords ListIdentifiers kept for lexigraphical consistency

metadataprefix character set for metadataprefix and setspec extended to URL-safe characters A-Z a-z 0-9 _! $ ( ) + -. *

in the periphery

provenance introduction of provenance container to facilitate tracing of harvesting history <about> <provenance> <origindescription> <baseurl>http://an.oa.org</baseurl> <identifier>oai:r1:plog/9801001</identifier> <datestamp>2001-08-13t13:00:02z</datestamp> <metadataprefix>oai_dc</metadataprefix> <harvestdate>2001-08-15t12:01:30z</harvestdate> </origindescription> </provenance> </about> please use it

friends introduction of friends container to facilitate web-style discovery of repositories <description> <friends> <baseurl>http://cav2001.library.caltech.edu/perl/oai</baseurl> <baseurl>http://formations2.ulst.ac.uk/perl/oai</baseurl> <baseurl>http://cogprints.soton.ac.uk/perl/oai</baseurl> <baseurl>http://wave.ldc.upenn.edu/olac/dp/aps.php4</baseurl> </friends> </description> please please please please please please use it

branding introduction of branding container for DPs to suggest rendering & association hints <branding xmlns="http://www.openarchives.org/oai/2.0/branding/" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.openarchives.org/oai/2.0/branding/ http://www.openarchives.org/oai/2.0/branding.xsd"> <collectionicon> <url>http://my.site/icon.png</url> <link>http://my.site/homepage.html</link> <title>mysite(tm)</title> <width>88</width> <height>31</height> </collectionicon> <metadatarendering metadatanamespace="http://www.openarchives.org/oai/2.0/oai_dc/" mimetype="text/xsl">http://some.where/dcrender.xsl</metadatarendering> <metadatarendering metadatanamespace="http://another.place/marc" mimetype="text/css">http://another.place/marcrender.css</metadatarendering> </branding>

oai-identifier revision of oai-identifier <description> <oai-identifier xmlns="http://www.openarchives.org/oai/2.0/oaiidentifier" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.openarchives.org/oai/2.0/oaiidentifier http://www.openarchives.org/oai/2.0/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryidentifier>oai-stuff.foo.org</repositoryidentifier> <delimiter>:</delimiter> <sampleidentifier>oai:oai-stuff.foo.org:5324</sampleidentifier> </oai-identifier> </description> domain based repository names

oai_dc OAI 1.x: oai_dc Schema defined by OAI OAI 2.0: oai_dc Schema imports from DCMI Schema for unqualified DC elements

MARC21 OAI 1.x:oai_marc OAI 2.0: LoC marxml, oai_marc http://www.loc.gov/standards/marcxml/

measures of success

registered data providers acceptance as fundamental infrastructure for research and implementation

120 100 80 60 40 20 0 7/15/2002 6/15/2002 registered data providers 3/15/2001 4/15/2001 5/15/2001 6/15/2001 7/15/2001 8/15/2001 9/15/2001 10/15/2001 11/15/2001 12/15/2001 1/15/2002 2/15/2002 3/15/2002 4/15/2002 5/15/2002 1/15/2001 2/15/2001 Total # Registered Sites

data providers highlights OCLC XtCat ~ thesis and dissertation Institute of Physics Publishing

acceptance as fundamental infrastructure NSDL Open Language Archives Community The European Library Belgian Union Catalogue Illinois State Union Catalogue CIMI JISC FAIR awards Mellon OAI-PMH service provider projects LOCKSS SPARC institutional repository paper Budapest Open Access Initiative JCDL & ECDL sessions on OAI-PMH DCMI 2002

future?

unanswered questions OAI plans

unanswered questions Is OAI-PMH really low-barrier infrastructure? NSDL experience indicates that significant barriers remain OAI work on low-entry specs and tools Utility of core metadata (unqualified DC) NSDL and other experience raises doubts Utility beyond resource discovery certification, usage logs, citation data, etc.

OAI plans return to eprints mission : work on OAI-PMH eprints profile awareness certification rewarding A interoperable grid R registration archiving

OAI plans return to eprints mission : work on OAI-PMH eprints profile e.g. Specification for the exchange of references Exploration of problem domain of exchange of usage log data Exchange of certification metadata Rights metadata Others? => come to our discussion group

Interest from DLF and Mellon to fund the OAI to pursue this path Interest from NSF in the exploration of research problems related to general interoperability between eprint repositories Creation of OAI eprints core group: Lagoze, Van de Sompel, Nelson, Warner Compile list of priorities OAI plans return to eprints mission : work on OAI-PMH eprints profile Invite relevant partners to collaborate on specific selected topics Keep close contact with parties working on eprint interoperability issues related to OAI-PMH (e.g. RomEO)

questions http://www.openarchives.org openarchives@openarchives.org