A Data Citation Roadmap for Scholarly Data Repositories

Similar documents
LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

WE HAVE SOME GREAT EARLY ADOPTERS

5/16/2018. Researcher Challenges with Data Use. AGU s position statement on data affirms that

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

DOIs for Research Data

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

First Light for DOIs at ESO

Dataverse and DataTags

Executive Committee Meeting

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

State of the Art in Data Citation

Reproducibility and FAIR Data in the Earth and Space Sciences

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Advancing code and data publication and peer review. Erika Pastrana, PhD Executive Editor, Nature Journals ALPSP_Sept 2018

Scholix Metadata Schema for Exchange of Scholarly Communication Links

Executive Committee Meeting

Core Technology Development Team Meeting

Executive Committee Meeting

DATAVERSE FOR JOURNALS

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

CrossRef tools for small publishers

Science Panel Discussion presentation: "A Data Sharing Story"

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

The UiT research data web portal. uit.no/forskningsdata

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Adoption of Data Citation Outcomes by BCO-DMO

Minimal Metadata Standards and MIIDI Reports

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

The Point of View of the Publisher

Data Citation and Scholarship

Making Data Count Promo0ng Open Data Through Usage and Impact Tracking #mdc

Core Technology Development Team Meeting

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Crossref DOIs help to uniquely identify and therefore link content

SHARING YOUR RESEARCH DATA VIA

Introducing the Springer Nature Data Support Services

Data Publication Services WG. Chairs: Adrian Burton (ANDS) Hylke Koers (Elsevier) - presenter

Steering Committee Meeting

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Interoperability Framework Recommendations

Open web-based annotation: Creating a lightweight, portable knowledge layer over biomedicine

ELIXIR webinar. schema.org structured data for life sciences. Events, training materials, organizations, 16 March 2016, 14:00 GMT

Taylor & Francis Online. A User Guide.

DOIs for Scientists. Kirsten Sachs Bibliothek & Dokumentation, DESY

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

COALITION ON PUBLISHING DATA IN THE EARTH AND SPACE SCIENCES: A MODEL TO ADVANCE LEADING DATA PRACTICES IN SCHOLARLY PUBLISHING. Source: NSF.

GEOSS Data Management Principles: Importance and Implementation

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION

MAKING INSPIRE DATA DISCOVERABLE AND FINDABLE THROUGH POPULAR SEARCH ENGINES

UC Irvine LAUC-I and Library Staff Research

Data Citation. DataONE Community Engagement & Outreach Working Group

Steering Committee Meeting

Core Technology Development Team Meeting

Providing Semantic and Bibliographic Data for Library Discovery. Cathy Dolbear Senior Link Architect, Data Strategy Oxford University Press

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

DATA SHARING FOR BETTER SCIENCE

Implementing the RDA Data Citation Recommendations for Long Tail Research Data. Stefan Pröll

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Helping Journals to Upgrade Data Publications for Reusable Research

Open Science, FAIR data and effective data management

Handles at LC as of July 1999

Implementation of Open-World, Integrative, Transparent, Collaborative Research Data Platforms: the University of Things (UoT)

Using DCAT-AP for research data

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

Data Management Plans

Quality Assured (QA) data

How to make your data open

Bengkel Kelestarian Jurnal Pusat Sitasi Malaysia. Digital Object Identifier Way Forward. 12 Januari 2017

Indiana University Research Technology and the Research Data Alliance

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

A peer-reviewed version of this preprint was published in PeerJ on 27 May 2015.

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

Open Access to Publications in H2020

Steering Committee Meeting

Executive Committee Meeting

Healthcare SEO. From Schema.org to Open Graph and Beyond

Towards a joint service catalogue for e-infrastructure services

Core Technology Development Team Meeting

DigitalHub Getting started: Submitting items

The Experimental Project of DOI Registration for Research Data at Japan Link Center (JaLC)

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Interoperation between Scientific Data and Literature : An overview

Dryad Curation Manual, Summer 2009

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

RA21. Resource Access in the 21 st Century

Core Technology Development Team Meeting

MACHINE ACTIONABLE INTEGRATION OF DATACITE AND DDI METADATA

HRA Open User Guide for Awardees

haplo-services.com

Steering Committee Meeting

Open Access & Open Data in H2020

Standards in Science Publishing: Persistent Person Identifiers. CSE Meeting, Montreal 5 May 2013

Introduction to INEXDA s Metadata Schema

Design & Manage Persistent URIs

Transcription:

A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science, Harvard University) DataCite Webinar, February 23, 2017 2017 Massachusetts General Hospital

NIH-funded BD2K program biocaddie for data discovery8

2014 Joint Declaration of Data Citation Principles JDDCP endorsed by over 100 scholarly organizations http://force11.org/datacitation

2015 Direct deposition and citation of primary research data http://doi.org/10.7717/peerj-cs.1

Data Citation Implementation Pilot 2016

Participants And you!

Role-based Participants Publishers Elsevier, Springer Nature, PLOS, elife, Wiley, Frontiers, etc. Data Repositories EMBL, Dataverse, Dryad, Figshare, Google, etc. Informaticians (NIH BD2K, EBI, CDL, etc. ) & Authors (YOU)

Publishers Roadmap Development! Leads: Amye Kenall & Helena Cousijn! Elsevier Participants: Elsevier, SpringerNature, elife, PLoS, Frontiers, Wyley, et al.! Workshop July 22 @ SpringerNature London campus, partially funded by NPG.! SpringerNature Continuing work via Telcons.!

11 Data cita%on at Springer Nature journals key events 1998 : Accession codes required for various data types at Nature journals and marked up in ar9cles (= data referencing rather than formal cita9on) 2012: Data cita9on included in BMC style guide for all its journals hops://blogs.biomedcentral.com/bmcblog/2012/01/19/ci9ng- and- linking- data- to- publica9ons- more- journals- more- examples- more- impact/ 2012: BMC launch of GigaScience (data cita9on mandatory) Publishers are taking data citation seriously! 2012: NPG LOD portal including data cita9ons dataset 2014: NPG Signatory of Joint Declara9on of Data Cita9on Principles hop://blogs.nature.com/scien9ficdata/2014/03/24/endorsing- the- joint- declara9on- of- data- cita9on- principles/ 2014: Launch of Scien9fic Data Data cita9on mandated for every ar9cle Uses JATS 1.0 with data cita9ons list specifically tagged 2016: Data cita9on policy piloted at Nature journals Strongly encourages datasets with DOIs to be included in reference lists 2016: Springer Nature wide project to support data cita9on in all journals policies adapted with permission from a talk by Ian Hrynaszkiewicz, July 2016!

Publisher s Roadmap Approach & Status! Roadmap based on experiences of early adopter publishers.! Examples, real situations, recommended approaches.! Complete end-to-end publishing workflow.! Preprint published January 19, 2017.! Cousijn et al. 2017 biorxiv https://doi.org/10.1101/100784.!

Publishers Roadmap adoption: 1,800 Elsevier journals as of Nov. 30, 2016!

Christian Haselgrove Philipe Rocca-Serra Ian Fore Repository Metadata Expert Group! Andy Jenkinson

Leads: Martin Fenner (DataCite), Merce Crosas (Dataverse)! Christian Haselgrove Philipe Rocca-Serra Ian Fore Repository Metadata Expert Group! Andy Jenkinson

Repositories Roadmap Approach & Status! Roadmap rev 1 specifies core data citation metadata.! What must be on landing pages & how it is made machine-readable.! Preprint published December 28, 2016! >1K downloads in 42 days.! Fenner et al. 2016 biorxiv https://doi.org/10.1101/097196.! Rev 2 being developed based on community feedback, including schema.org initiative.!

Article - Landing Page - Data 1 2 3 Article Landing Page Data MeSH DATS PubMed DataMed Discovery Indexes

Article - Landing Page - Data Bibliographic Repository Data Repository Article 1 Landing Page 2 3 Data Finding citation a Data reference PID c b Human readable metadata Machine readable metadata article PID citation metadata data PID d 1 1 0 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 0 0 1 2 3 Article Landing Page Data MeSH DATS PubMed DataMed

Repositories: Required! 1. All datasets intended for citation must have a globally unique persistent identifier that can be expressed as unambiguous URL.! 2. Persistent identifiers for datasets must support multiple levels of granularity, where appropriate.! 3. This persistent identifier expressed as URL must resolve to a landing page specific for that dataset.! 4. The persistent identifier must be embedded in the landing page in machine-readable format.! 5. The repository must provide documentation and support for data citation.!

Globally Unique! Persistent Identifier! Persistent method for identification: Metadata must persist even beyond the data it describes! Machine actionable: PID resolvable as an HTTP URI! Globally Unique: Must use a prefix if ID only unique within a database! Widely used by a community: For example, in life sciences accession numbers (not DOIs) are widely used.!

! Multiple Levels of Granularity! Support citation of a specific version, as well as citation of unspecified version! In some cases, data is uniquely identified as a collection of many items (example in next slide)!

https://doi.org/10.18116/c6h02x

Persistent Identifier resolves to Landing Page! Using HTTP redirection makes it easier to maintain a stable URL for the persistent identifier! Identifiers.org, DOIs, handles and ARKs all use redirection! Expectation is that the persistent identifier resolves to a human-readable page with more information! (optional) Use content negotiation to resolve the persistent identifier URL to machine-readable metadata, or to the content itself!

Persistent Identifier embedded in the Landing Page! Bibliographic Repository Data Repository Article 1 Landing Page 2 3 Data Finding citation a Data reference PID c b Human readable metadata Machine readable metadata article PID citation metadata data PID d 1 1 0 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 0 0 Human Readable Machine Readable

Documentation and Support! The repository must provide documentation about how data should be cited, how metadata can be obtained, and who to contact for more information.! The DCIP FAQ Expert Group has generated example documentation for data repositories, which will be provided on a dedicated website.!

Repositories: Recommended! 6. The landing page should include metadata required for citation, and ideally also metadata helping with discovery, in human-readable and machine-readable format.! 7. The machine-readable metadata should use schema.org markup in JSON-LD format.! 8. Metadata should be made available via HTML meta tags to facilitate use by reference managers.!

Citation Metadata

Metadata on Landing Pages: For Humans

Metadata on Landing Pages: schema.org! Schema.org is community activity to promote structured data on the internet, started in 2011 by Google, Microsoft, Yahoo, and Yandex. Schema.org can be displayed as microdata or RDFa embedded in HTML, or via JSON-LD. JSON-LD is the preferred format for data citation metadata. Citation metadata are fully supported by schema.org (see earlier citation metadata table), several groups are extending support for more specialized metadata, including http://bioschemas.org in the life sciences. DataCite has released a command-line tool ( https://github.com/datacite/bolognese) to automatically generate schema.org/json-ld for DataCite and Crossref DOIs, making it easier for data centers to integrate schema.org in landing pages.

Schema.org Example! { "@context": "http://schema.org", "@type": "Dataset", "@id": "https://doi.org/10.18116/c6h02x", "additionaltype": "Imaging Data", "name": "Image collection 10.18116/C6H02X", "alternatename": "http://iaf.virtualbrain.org/search/reconstitute/2ccda04d", "author": [{ "@type": "Person", "givenname": "JL", "familyname": "Breeze" },...

Metadata on Landing Pages:! HTML Meta Tags!

Recommendations: Optional! 9. Content negotiation for schema.org/json-ld and other content types may be supported so that the persistent identifier expressed as URL resolves directly to machinereadable metadata.! 10. HTTP link headers may be supported to advertise content negotiation options! 11. Metadata may be made available for download in Bibtex or other standard bibliographic format.!

Content Negotiation for Machine Readable Metadata

HTTP Link Headers

Metadata in Standard Bibliographic Format

Conclusions We need to systematically cite data for improved scientific transparency, reproducibility, robustness. Persistent discoverable data archives with cited data will enhance capability for validation & re-use. DCIP promotes data citation in journals, repositories, and identifier / metadata services at scale. Rev 1 Roadmaps and specs released in biorxiv Continuing outreach, documentation and discussion.

DCIP Executive Tim Clark, Harvard Medical School & MGH (co-chair) Maryann Martone, Hypothesis & UCSD (co-chair) Carole Goble, University of Manchester & ELIXIR Jeffrey Grethe, UCSD & biocaddie Jo McEntyre, EMBL-EBI & ELIXIR Joan Starr, California Digital Library Martin Fenner, DataCite Simon Hodson, CODATA Chun-Nan Hsu, UCSD