Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

Similar documents
MACHINE ACTIONABLE INTEGRATION OF DATACITE AND DDI METADATA

Specific requirements on the da ra metadata schema

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

1. General requirements

DOIs for Research Data

Research Data Repository Interoperability Primer

Using Persistent Identifiers at

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

BIBLID (2004) 93:1 pp (2004.6) 209. NBINet NBINet 92

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Opus: University of Bath Online Publication Store

Introduction to INEXDA s Metadata Schema

Establishing New Levels of Interoperability for Web-Based Scholarship

Core DOI Specification

ReST 2000 Roy Fielding W3C

Web Architecture Part 3

Historization and Versioning of DDI-Lifecycle Metadata Objects

INF5750. RESTful Web Services

B2SAFE metadata management

Phase 1 RDRDS Metadata

MIME and the Web. draft-masinter-mime-web-info-01.txt I-D by Larry Masinter Presented by Alexey Melnikov

Data formats for exchanging classifications UNSD

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

Spatial Data on the Web

Linked Data Semantic Web Technologies 1 (2010/2011)

MIME SNIFFING ISSUES. Larry Masinter IETF 82 Taipei November 16, 2011

Implementing the RDA Data Citation Recommendations for Long Tail Research Data. Stefan Pröll

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

Unique Identifiers Assessment: Results. R. Duerr

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA

Metadata Standards and Applications

DataCite Persistent links to scientific data. Jan Brase

Survey of Existing Services in the Mathematical Digital Libraries and Repositories in the EuDML Project

Fedora. CS 431 April 17, 2006 Carl Lagoze Cornell University. Acknowledgements: Sandy Payette (Cornell)

Ontology Servers and Metadata Vocabulary Repositories

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Digital Library Interoperability. Europeana

Managing Data in the long term. 11 Feb 2016

A Data Citation Roadmap for Scholarly Data Repositories

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Minimal Metadata Standards and MIIDI Reports

OAI-ORE. A non-technical introduction to: (

Internet Standards for the Web: Part II

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Paradigm shifts in Information Access - beyond classical scholarly publication

CESSDA ERIC Persistent Identifier Policy Best Practice Guidelines

Basics in good research data management (RDM) for reviewing DMPs

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

eresearch Australia The Elephant in the Room! Open Access Archiving and other Gateways to e-research Richard Levy

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

DuraSpace FAIRness and GDPR

B2FIND and Metadata Quality

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

How to work with HTTP requests and responses

doi> Digital Object Identifier

re3data.org - Making research data repositories visible and discoverable

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

First Light for DOIs at ESO

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

Multi-agent and Semantic Web Systems: Linked Open Data

RDF and Digital Libraries

Robin Wilson Director. Digital Identifiers Metadata Services

Spatial Data on the Web

Software + Services for Data Storage, Management, Discovery, and Re-Use

Architecture domain. Leonardo Candela. DL.org Autumn School Athens, 3-8 October th October 2010

World-Wide Web Protocols CS 571 Fall Kenneth L. Calvert All rights reserved

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval

> Semantic Web Use Cases and Case Studies

Digital Libraries at Virginia Tech

Persistent identifiers, long-term access and the DiVA preservation strategy

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

Persistent Identifiers

SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management

Approaches to Making Dynamic Data Citeable Recommendations of the RDA Working Group Andreas Rauber

Towards a joint service catalogue for e-infrastructure services

Comparing Open Source Digital Library Software

Building a missing item in INSPIRE: The Re3gistry

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

DATAVERSE FOR JOURNALS

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Linking Literature to Astronomy Data with DOIs

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Utilizing PBCore as a Foundation for Archiving and Workflow Management

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

HTTP Reading: Section and COS 461: Computer Networks Spring 2013

What if annotations were reusable: a preliminary discussion

DFP Mobile Ad Network and Rich Media API

Handles at LC as of July 1999

Digitisation Standards

Metadata for Electronic Information Resources

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

Transcription:

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data Workshop on Metadata and Persistent Identifiers for Social and Economic Data May 7-8 2012, Berlin Joachim Wackerow GESIS Leibniz Institute for the Social Sciences

Outline Motivation What is a machine-actionable processing chain? Use cases Current possibilities Conclusion

Motivation Metadata and Persistent Identifiers are complementary ingredients in the world of digital object management, making it possible to find, reuse, reference, cite and link digital content Good metadata is necessary to identify the relevant data when searching, reusing, or linking data. A program is as likely to follow a URL as a person is. Possible machine-actionable scenarios : Harvesting metadata Processing metadata/data Better interoperability between the levels of PID, different types of metadata and the data itself. Compound object of PID, metadata, and data to make something useful.

Difference Publication/Data Data has a different nature than publication. A publication such as an article is for human consumption. A retrieved article can be immediately read. Data is usually for processing by programs. Metadata is necessary to understand the data. Programs are required for processing. Example for important metadata Type of file format, i.e. CSV, binary, program-specific Logical file structure, i.e. rectangular, Unit of record, i.e. person Meaning of columns, i.e. age

What Is a Machine-actionable Processing Chain? Processing of any components related to a PID Catalog metadata (like DataCite) Rich metadata (like DDI, SDMX) Data (in different formats)

Harvesting Metadata Building value-added services with registries/portals. Examples: Providing searching possibilities for allowing data to be found by relevant criteria bringing related/similar data together distinguishing dissimilar data Linking data and publications in an integrated way

Use Case: Specific Search Search for studies using the educational classification ISCED With catalog metadata: search for subject education Result would be broad With rich metadata: search for variable ISCED Result is very specific Prerequisite would be a registry/portal which uses catalog metadata and rich metadata

Processing Data Processing data Building subsets Merging data of different sources Analyzing data Access to data is often restricted, especially with data on people

Machine-actionable Path for Value-added Services List of PIDs possibly from OAI-PMH Catalog Metadata Persistent Identifier Resolver (DataCite, ) Details on Data (Rich Structured Metadata) (machine-actionable) Data Catalog / Registry / Portal with value-added services Processing data

Resolution - Current Status Client (Web-Browser) requesting PID Persistent Identifier (DOI, URN, ) Resolver (DataCite, ) Mapping Table PID - URL Landing Page with catalog metadata (human-readable) Data Problem Not machineactionable Details on Data (Rich Metadata) (human-readable) Details on Data (Rich Structured Metadata) (machine-actionable)

Current Machine-actionable Possibilities APIs of registry agencies to receive machine-actionable content Example: OpenURL Issues: Not scalable, APIs vary from one data provider to next Content negotiation Program can resolve a DOI through the standard proxy Desired machine-actionable content can be specified in HTTP Accept header DataCite Content Service (similar to CrossRef) Restrictions: approach cannot be used by web browsers HTML Links with direkt link for each representation Issues: Not compliant to web standards like REST

Current Machine-actionable Possibilities (continued) Semantic Web / Linked data Den Haag Manifesto (June 2011): Five steps to connecting Persistent Identifiers and Linked Open Data Most of these steps involve adopting linked data principles including support for content negotiation. DataCite2RDF Mapping DataCite Metadata Scheme Terms to ontologies Intention: enabling these metadata to be understood programmatically and integrated automatically with similar data from elsewhere. RDF representation of DataCite metadata OAI-PMH, Open Archives Initiative Protocol for Metadata Harvesting DataCite has service in beta status

GESIS DBK Example Web Page DDI Codebook/Lifecycle Data SPSS: http://info1.gesis.org/dbksearch19/download.asp?db=e&id=20706

Pangaea Example Web Page http://dx.doi.org/10.1594/pangaea.755351?format=textfile

OpenURL rfr_dat - The referrer s parameter payload Name value pairs rft_dat - The referent s parameter payload Name value pairs Reference: Parameter Passing Via The DOI Proxy

HTTP Content Negotiation HTTP Content Negotiation is a method for HTTP clients to request different representations of an Internet resource Server-driven Clients specify the desired media types (MIME type). Server responds according to request if possible, otherwise with a default response. Client-driven Server sends the client a list of available representations of the requested resource and the client application selects the one to view. Different representations are not accessible by URLs Not intendend for Web browser. Different solution required.

Content Negotation - Based on Solution of CrossRef/DataCite Client requesting PID Persistent Identifier (DOI, URN, ) Resolver (DataCite, ) Mapping Table PID - URL Web Page on Data with catalog metadata (human-readable) Details on Data (Rich Metadata) (human-readable) Different Accept Headers in addition to URL requesting different representations of PID Details on Data (Rich Structured Metadata) (machine-actionable) Data

Content Negotation Distinction of HTTP Accept Headers Data Provider Resolver Default no Accept Header Web Page (human-readable) Other Representations Registration Agency DataCite XML RDF

Content Negotiation Example http://dx.doi.org/10.5524/100005 Response: Landing page Redirected to data.datacite.org http://dx.doi.org/10.5524/100005 Accept: application/x-datacite+text Response: Text with citation information Li, J; Zhang, G; Lambert, D; Wang, J; (2011): Genomic data from the Emperor penguin (Aptenodytes forsteri); GigaScience. http://dx.doi.org/10.5524/100005 Example DOI from DataCite Content Service (data.datacite.org)

Content Negotiation Example (continued I) http://dx.doi.org/10.5524/100005 Accept: application/x-datacite+xml Response: DataCite XML <resource > <identifier identifiertype="doi">10.5524/100005</identifier> <creators> <creator> <creatorname>li, J</creatorName> </creator> <titles> <title>genomic data from the Emperor penguin (Aptenodytes forsteri)</title> </titles> <publisher>gigascience</publisher> <publicationyear>2011</publicationyear> </resource>

HTTP Accept Header Accept: MIME type [; extension(s)] Extensions are name/value pairs Extensions can be used for fine tuning. Extensions should come from a controlled vocabulary. No standard existent.

MIME Media Types MIME - Multipurpose Internet Mail Extensions extends the format of email to support such as nontext attachments MIME type could be understood as content type and/or a file format Examples Web page: text/html; charset=utf-8 PDF file: application/pdf Binary data : application/octet-stream Vendor-specific: application/vnd.string Non-standard: application/x-string

MIME Types - Limitations IANA (Internet Assigned Numbers Authority) maintains a registry Important types are missing like CSV, statistical packages Apache Group maintains another list Includes for example CSV, but nothing for statistical packages. Additional information is often required to process a file appropriately. Example: Delimiter in CSV files. File extensions are a related approach, but can be ambigious (like ".rdf")

HTML Links http://data.datacite.org/ application/x-datacite+xml /10.5524/100005 Response: DataCite XML Accessible by Web browsers. Workaround for this purpose. Purpose is just to provide a link. The form doesn t seem to conform REST principles. The core part of the URL of the resource changed. Possible alternative: http://data.datacite.org/10.5524/100005? mimetype=application/x-datacite+xml

Relationship of DataCite Metadata / Rich Metadata / Data PID/DOI Data DataCite Metadata Missing Rich Metadata like DDI

Conclusion I Techniques exist for a machine-actionable processing chain to access PID-related representations. Content negotiation seems to be a flexible approach. Rules for redirection should be clarified between registration agency and data provider MIME types are sometimes not sufficient. Values for Accept header extensions should come from controlled vocabularies Relationship from DataCite metadata to domainspecific metadata schemes need to be clarified. Is there a specific property missing like: RichMetadataIdentifier richmetadataidentifiername richmetadataidentiferscheme

Conclusion II Rich metadata seems to be important What to do if this kind of metadata is not available? There seems to be a need for best practices for data providers such as How to enable machine-actionable processing of metadata/data

Thank you joachim.wackerow@gesis.org