Persistent Identifiers for Earth Science Provenance

Similar documents
Unique Identifiers Assessment: Results. R. Duerr

Digital Preservation. Unique Identifiers

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

Internet Engineering Task Force (IETF) Obsoletes: 7302 September 2016 Category: Informational ISSN:

Persistent Identifiers for Digital Resources

Persistent Identifiers

Persistent and unique Identifiers

ITTC Science of Communication Networks The University of Kansas EECS 784 Identifiers, Names, and Addressing

Chapter 3: Uniform Resource Identifiers References:

doi> Digital Object Identifier

Robin Wilson Director. Digital Identifiers Metadata Services

Request for Comments: Xerox Corporation December Functional Requirements for Uniform Resource Names

Linking datasets with user commentary, annotations and publications: the CHARMe project

Managing Web Resources for Persistent Access

Dutch View on URN:NBN and Related PID Services

and Registration Authorities

5JSC/ACOC/1/Rev 7 August Joint Steering Committee for the Revision of AACR

Afsaneh Teymourikhani 1 Saeedeh Akbari-Daryan 2

State of the Art in Data Citation

ResolutionDefinition - PILIN Team Wiki - Trac. Resolve. Retrieve. Reveal Association. Facets. Indirection. Association data. Retrieval Key.

Data Citation and Scholarship

3. Uniform Resource Identifiers 3-1

1 General Uses of XRI... 5

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia framework (MPEG-21) Part 21: Media Contract Ontology

Data Citation Then and Now

The Identity Web An Overview of XNS and the OASIS XRI TC

+ Page Page 21 + I Want to Hold Your Hand(le)

Design & Manage Persistent URIs

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017

Introduction to INEXDA s Metadata Schema

Some important concepts relating to identifiers are uniqueness, resolution, interoperability, and persistence.

Information Centric Networking for Delivering Big Data with Persistent Identifiers

Provenance Tracking in an Earth Science Data Processing System

ISO CTS2 and Value Set Binding. Harold Solbrig Mayo Clinic

GEOSS Data Management Principles: Importance and Implementation

ISO INTERNATIONAL STANDARD

DRAFT SCIT/SDWG/11 Ad Agenda item 5 UNIFORM RESOURCE IDENTIFIERS FOR INDUSTRIAL PROPERTY RESOURCES

Creating our Digital Cultural Heritage. Characteristics of Digital Information

Report on the European Resolution Discovery Service (ERDS) Meeting (Feb 17/18, 2010)

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

ISO/IEC INTERNATIONAL STANDARD

Bengkel Kelestarian Jurnal Pusat Sitasi Malaysia. Digital Object Identifier Way Forward. 12 Januari 2017

Handles at LC as of July 1999

Core DOI Specification

Information Network I Web 3.0. Youki Kadobayashi NAIST

ISO/IEC INTERNATIONAL STANDARD

Persistent identifiers: jnbn, a JEE application for the management of a national NBN infrastructure

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

Subject Repositories for Research Data

Intended status: Informational September 6, 2013 Expires: April 2014

Web 3.0 Overview: Interoperability in the Web dimension (1) Web 3.0 Overview: Interoperability in the Web dimension (2) Metadata

Wendy Thomas Minnesota Population Center NADDI 2014

The Dublin Core Metadata Element Set

Registry for identifiers assigned by the Swedish e- identification

Archive II. The archive. 26/May/15

ISO/IEC Information technology Multimedia framework (MPEG-21) Part 3: Digital Item Identification

Semantic Web Publishing. Dr Nicholas Gibbins 32/4037

Service Guidelines. This document describes the key services and core policies underlying California Digital Library (CDL) s EZID Service.

EECS 122: Introduction to Computer Networks DNS and WWW. Internet Names & Addresses

Resilient Linked Data. Dave Reynolds, Epimorphics

* Network Working Group. Expires: January 6, 2005 August A URN namespace for the Open Geospatial Consortium (OGC)

Category: Informational November 2000

Data Stewardship NOAA s Programs for Archive, Access, and Producing Climate Data Records

EIDR: ID FORMAT. Ver January 2012

EIDR: ID FORMAT Ver. 1.1 August 19, 2013

1. Understand what persistent identifiers are, how they work and the benefits to using them in a DSpace repository environment

Joint Steering Committee for Development of RDA. Gordon Dunsire, Chair, JSC Technical Working Group

Network Working Group. Category: Informational April A Uniform Resource Name (URN) Namespace for the Open Geospatial Consortium (OGC)

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

Naming and Addressing

Specific requirements on the da ra metadata schema

Registry for identifiers assigned by the Swedish e-identification board

Category: Informational. G. Bilder Crossref J. Kunze California Digital Library S. Warner Cornell University April 2019

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. February 22, Prof. Joe Pasquale

Network Working Group Request for Comments: 3937 Category: Informational October 2004

XML ELECTRONIC SIGNATURES

Lesson 14 SOA with REST (Part I)

This document is a preview generated by EVS

ISO INTERNATIONAL STANDARD. Health informatics Harmonized data types for information interchange

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Open Command and Control (OpenC2) Language Specification. Version 0.0.2

Identifiers and Their Role In Networked Information Applications

CS514: Intermediate Course in Computer Systems

For Attribution: Developing Data Attribution and Citation Practices and Standards

PSI Registries. Sam Oh, Sungkyunkwan University Montreal, Canada. WG3, Montreal, Sam Oh

ISO/IEC INTERNATIONAL STANDARD. Information technology ASN.1 encoding rules: Mapping W3C XML schema definitions into ASN.1

Utilizing PBCore as a Foundation for Archiving and Workflow Management

Administrative and preservation metadata

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Expires six months from November draft-ietf-urn-resolution-services-04.txt. URI Resolution Services Necessary for URN Resolution

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

DigitalHub Getting started: Submitting items

Archival Information Package (AIP) E-ARK AIP version 1.0

Approach to persistent identifiers and data-service-coupling in the German Spatial Data Infrastructure

Issues in Information Systems Volume 16, Issue IV, pp , 2015

Data Citation. Mark Parsons, Ruth Duerr and the Federation of Earth Science Information Partners (ESIP)

Indiana University Research Technology and the Research Data Alliance

The Semantic Web DEFINITIONS & APPLICATIONS

Identity Crisis in Linked Data

Transcription:

Persistent Identifiers for Earth Science Provenance Curt Tilmes Curt.Tilmes@umbc.edu ebiquity Research Group Presentation February 25, 2009

Overview Background Identification Persistence Actionable Identifiers (a few) Identifier issues Earth Science Data (some) Identifier schemes: W3C: URI, URL, URN UUID: Universally Unique IDentifier OID: Object Identifier PURL: Persisent URL DOI: Digital Object Identifier XRI: Extensible Resource Identifier ARK: Archival Resource Key N2T: Name to Thing Resolver References 2 of 21

Background Historically, published scientific research includes a description of the experiment that yielded the results in sufficient detail to reproduce the experiment and get the same results. Reproducibility(among other things) -> Credibility -> Trust Modern research in earth science (and other fields) depends often involves sifting through mounds of data from a variety of sources (field sensors, satellite data, etc.) and applying various algorithms to reduce/ transform/massage that data in various ways The data is likely the result of the work of hundreds of individuals over decades. Representing the provenance of such scientific results in a manner conducive to exploration, understanding and reproducibility is one of my interests. A key to this sort of representation is identifiers. 3 of 21

What am I trying to identify? All of the artifacts involved in the provenance of a scientific result Data Algorithms Documentation Sensors/Instruments/Instrument platforms People (reputation) Organizations (reputation) Published scientific papers (add to credibility) Computer systems Abstract things like a data transformation event or a validation experiment An ephemeral execution of a web service 4 of 21

Data identification How do you identify a person? Consider Shakespeare's Romeo and Juliet Is that a good identifier? The intellectual content of the play A published book with the play A specific book (with a little jelly on page 32) A performance of the play A translation into another language Can I cite an act, scene, line, page, paragraph? ( microattribution )... The library folks have a very good handle on this for various content and media. 5 of 21

Persistence It is intended that the lifetime of a [persistent identifier] be permanent. That is, the [persistent identifier] will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name. http://www.doi.org/doi_presentations/overview_slides_4dec2007/071205doioverview.ppt My definition I want the provenance web leading to a published component of the scientific literature to live as long as the publication is scientifically valid. (In fact, you can use a citation chain to determine when the identifier is no longer referenced in any way.) 6 of 21

Actionable Identifiers 'Actionable' Identifier = Can I click on it? What happens if the resource is no longer around? We (NASA archive) delete old, obsolete data that takes up expensive space. Even if the data is gone, I'd still like to keep the identifier around... What happens if valuable data is moved from one steward to another? (We do this all the time...) An entire archive taken over by another organization A single dataset within the archive moved from one organization to another What about data served from multiple locations? What about data served in multiple formats? 7 of 21

A few identifier issues Data itself vs. a specific representation/format of that data Content invariance: Subject to correction? Subject to revision? What is the difference? Consider a reprocessing with identical inputs and algorithms, but in a slightly different computing environment.. Does more than one identifier refer to the same resource? Can you compare two identifiers for equivalence? What happens when the resource itself moves? What happens when there is a new 'steward' for the resource? What happens if the resource physically resides in multiple places? What about produce on demand? Can be real-time or not.. How are resources cited? discovered? Identifier standards, security, scalability, compatibility Dependent on central registries or authorities? Should identifiers be opaque or meaningful (include bits of metadata with semantics)? Structured with hierarchies? Can I predict a URI? ('OpenURL') 8 of 21

Earth Science Data Consider a published research paper that concludes some fact and says the data came from NASA instrument MODIS. There are two MODIS instruments flying right now. The set of standard products have 5 different reprocessing collections. The data are too big to keep forever, usually all but the last two versions are deleted. There are dozens of different algorithms that derive products from the captured data A different calibration of the level 1 data can affect the level 2 and level 3 data 9 of 21

Earth Science Data (cont) Metadata We have a tremendous amount of metadata for the content of data (Date, Orbit number, quality flags, etc.) There is a distinct set of metadata for the content container (file size, file format, digital signature) Versions Every algorithm has strict configuration management with versions mapping to revisions (should have better documentation) What does version mean to data? Consider Algorithm X of version 1.2 is used to produce file A If we revise algorithm X and reprocess with version 1.3, the produced file A is different, we note in its metadata that it was produced with version 1.3 Now what happens if we recalibrate the instrument that produced the data that was fed to algorithm X? 10 of 21

W3C: URI,URL,URN URI Uniform Resource Identifier URN Name = What is it called? URL Locator = Where can I find it? <scheme>:<scheme specific identifier> http://example.org/something urn:<namespace>:<namespace specific string> urn:isbn:0451450523 http://www.w3.org/tr/uri-clarification/ 11 of 21 http://en.wikipedia.org/wiki/file:uri_venn_diagram.svg

UUID: Universally Unique IDentifier A scheme for distributed systems to independently create unique identifiers without central coordination A 16-byte (128-bit) number (Enough to make 1 trillion UUIDs every nanosecond for over 10 billion years) Several different versions based on MAC address, time, hashing, random numbers. Canonical representation to make them easy to recognize: 550e8400 e29b 41d4 a716 446655440000 urn:uuid:550e8400 e29b 41d4 a716 446655440000 12 of 21

OID: Object Identifier Very Formal, ISO standard hierarchical naming scheme Think a truly global, universal directory tree similar to a unix directory tree. Any organization can register and get a path in the tree and populate it how they please. Several different notations in use: {iso(1)member body(2)f(250)type org(1)ft(16)test(99)88} 1.2.250.1.16.99.88 oid://1/2/250/1/16/99/88 urn:oid:1.2.250.1.16.99.88 13 of 21

PURL: Persistent URL Very simple indirect mapping that redirects from a PURL to a URL with standard HTTP redirect Includes partial redirects to relocate whole hierarchies Multiple PURLs could map to one URL (don't do that!) What about bookmarks? <scheme>://<purl resolver>/<name> http://purl.org/mypath/mydocs/mydoc 14 of 21

DOI: Digital Object Identifer A framework for persistent identification A federation of Registration Agencies that control portions of the namespace RA pay fees based on volume to register DOIs <RA id>/<ra specific part> 10.1007/978 3 540 89965 5_23 http://dx.doi.org/10.1007/978 3 540 89965 5_23 urn:info:doi:10.1007/978 3 540 89965 5_23 15 of 21

XRI: Extensible Resource Identifier From OASIS (Organization for the Advancement of Structured Information Standards) Brokers register with XDI.ORG and handle registration of identifiers (brokers are accredited and have reputation I-numbers machine friendly identifiers (like IP addresses) for any resource I-names human friendly identifiers that resolve to an I- number Various types =Person, @trademark, +anything xri://<authority>/<path> =Curt.Tilmes http://www.xdi.org/xri and xdi explained.html 16 of 21

XRI (cont.) Hierarchical like URIs, but includes cross-referencing Fragments can be either persistent(!) or reassignable(*) Formal methods for normalization and comparison Any URI could be a global authority, not just a hostname:port xri://@example.com/something xri://(mailto:john.doe@example.com)/favorites Apparently actively opposed by (portions of) W3C. Last standard proposal was voted down. We are not satisfied that XRIs provide functionality not readily available from http: URIs. http://www.equalsdrummond.name/?p=130 17 of 21

ARK: Archival Resource Key Scheme for Long-Term, Persistent Actionable Identifiers From California Digital Library, John Kunze Organizations must make a commitment to long-term, persistent, actionable PURLs, Handles, etc. add indirection, but not (necessarily) organizational commitment Goals: Identifiers deliver you to objects (where feasible) (not a 404 ) Identifiers deliver you to object metadata Identifiers deliver you to statements of commitment 18 of 21

ARK (cont.) Name Mapping Authority Hostport (NMAH)= replaceable address that can be used to resolve the identifier (Kunze calls this the booster rocket ) Name Assigning Authority Number (NAAN) = centrally registered list of organizations Append '?' for metadata and '??' for commitment policies Human readable and structured for computers If the NMAH goes away, you can still find the object by checking central registry for new NMAH for the NAAN [<NMAH>/]ark:/<NAAN>/<Name> ark:12345/myname http://some.org/ark:12345/myname http://other.org/ark:12345/myname 19 of 21

N2T : Name to Thing Resolver http://n2t.info Very simple resolver mirrored by consortium volunteers under one hostname http://n2t.info/<global prefix>/<local part> http://n2t.info/naa/... (NAA = N2T NAA Number, eg, 12345) http://n2t.info/ark:/naa/... (NAA = ARK NAA Number, eg, 12345) http://n2t.info/urn:naa:... (NAA = URN Naming Authority, eg, nbn) http://n2t.info/hdl:naa/... (NAA = Handle Naming Authority, eg, 12345) http://n2t.info/doi:naa/... (NAA = DOI Naming Authority, eg, 10.12345) http://n2t.info/purl:/naa/... (NAA = PURL Resolver Hostport, eg, purl.org) 20 of 21

References "Cool URIs Don't Change." http://www.w3.org/provider/style/uri "Naming and Addressing: URIs, URLs,..." http://www.w3.org/addressing/ "Object Identifer (OID)" http://www.oid-info.com/ "The Digital Object Identifier (DOI) System." http://www.doi.org/ "Persistent Uniform Resource Locator" http://purl.org/ " A Universally Unique IDentifier (UUID) URN Namespace" http://www.ietf.org/rfc/rfc4122.txt "XRI (Extensible Resource Identifier)" http://www.xdi.org/xri-and-xdi-explained.html "ARK (Archival Resource Key) http://www.cdlib.org/inside/diglib/ark/arkspec.html Name-to-Thing (N2T) Resolver http://n2t.info Thanks to NASA ESDSWG and Ruth Duerr of NSIDC 21 of 21