LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Similar documents
A Data Citation Roadmap for Scholarly Data Repositories

5/16/2018. Researcher Challenges with Data Use. AGU s position statement on data affirms that

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

DOIs for Research Data

Scholix Metadata Schema for Exchange of Scholarly Communication Links

Dataverse and DataTags

First Light for DOIs at ESO

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Data Publication Services WG. Chairs: Adrian Burton (ANDS) Hylke Koers (Elsevier) - presenter

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

DATAVERSE FOR JOURNALS

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

State of the Art in Data Citation

Executive Committee Meeting

Core Technology Development Team Meeting

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

WE HAVE SOME GREAT EARLY ADOPTERS

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

Data Citation. DataONE Community Engagement & Outreach Working Group

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Data Publishing and Data Linking Introducing SCHOLIX

Implementing the RDA Data Citation Recommendations for Long Tail Research Data. Stefan Pröll

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

Executive Committee Meeting

Executive Committee Meeting

Open web-based annotation: Creating a lightweight, portable knowledge layer over biomedicine

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Helping Journals to Upgrade Data Publications for Reusable Research

Data Citation and Scholarship

Adoption of Data Citation Outcomes by BCO-DMO

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Implementation of Open-World, Integrative, Transparent, Collaborative Research Data Platforms: the University of Things (UoT)

Reproducibility and FAIR Data in the Earth and Space Sciences

Interoperability Framework Recommendations

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Indiana University Research Technology and the Research Data Alliance

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

MAKING INSPIRE DATA DISCOVERABLE AND FINDABLE THROUGH POPULAR SEARCH ENGINES

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

The UiT research data web portal. uit.no/forskningsdata


European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

SHARING YOUR RESEARCH DATA VIA

State of the Art in Ethno/ Scientific Data Management

CrossRef tools for small publishers

Towards a joint service catalogue for e-infrastructure services

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

Core Technology Development Team Meeting

Approaches to Making Dynamic Data Citeable Recommendations of the RDA Working Group Andreas Rauber

Minimal Metadata Standards and MIIDI Reports

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

The OpenAIREplus Project

For Attribution: Developing Data Attribution and Citation Practices and Standards

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Making Data Count Promo0ng Open Data Through Usage and Impact Tracking #mdc

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

ELIXIR webinar. schema.org structured data for life sciences. Events, training materials, organizations, 16 March 2016, 14:00 GMT

TEXT MINING: THE NEXT DATA FRONTIER

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Core Technology Development Team Meeting

MACHINE ACTIONABLE INTEGRATION OF DATACITE AND DDI METADATA

Research Data Repository Interoperability Primer

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

Science Panel Discussion presentation: "A Data Sharing Story"

DATA SHARING FOR BETTER SCIENCE

Pilot integration of an electronic lab notebook and an open source research data repository as part of a modular biomedical research data platform

Data Citation Then and Now

Steering Committee Meeting

Using DCAT-AP for research data

BPMN Processes for machine-actionable DMPs

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

Healthcare SEO. From Schema.org to Open Graph and Beyond

GEOSS Data Management Principles: Importance and Implementation

Quality Assured (QA) data

Introduction to INEXDA s Metadata Schema

The Experimental Project of DOI Registration for Research Data at Japan Link Center (JaLC)

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Core Technology Development Team Meeting

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Research Elsevier

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA

Introducing the Springer Nature Data Support Services

DataCite Persistent links to scientific data. Jan Brase

Service Integration - A Web of Things Perspective W3C Workshop on Data and Services Integration

Crossref DOIs help to uniquely identify and therefore link content

Persistent Identifiers

COALITION ON PUBLISHING DATA IN THE EARTH AND SPACE SCIENCES: A MODEL TO ADVANCE LEADING DATA PRACTICES IN SCHOLARLY PUBLISHING. Source: NSF.

DOIs for Scientists. Kirsten Sachs Bibliothek & Dokumentation, DESY

Steering Committee Meeting

Core Technology Development Team Meeting

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Interoperation between Scientific Data and Literature : An overview

Taylor & Francis Online. A User Guide.

A peer-reviewed version of this preprint was published in PeerJ on 27 May 2015.

Open Access to Publications in H2020

Transcription:

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science, Harvard University) May 15, 2017

2014 Joint Declaration of Data Citation Principles JDDCP endorsed by over 100 scholarly organizations http://force11.org/datacitation

2015 Direct deposition and citation of primary research data http://doi.org/10.7717/peerj-cs.1

2016 Data Citation Implementation Pilot (DCIP) Led by Tim Clark, Maryann Martone, Jeffrey Grethe

Participants And you!

Data Citation Generic Example! example of a data cita.on as it would appear in a reference list* Principle 2: Credit and A1ribu4on (e.g. authors and data repositories). *Note that the format is not intended to be defined with this example, as formats will vary across publishers and communi.es [Principle 8: Interoperability and flexibility]. Author(s), Publica.on Date, Title, Data Repository or Archive, [Dataset], Persistent Iden.fier, Version Principle 4: Unique Iden4fier (e.g. DOI, Handle). Principle 5, 6 Access, Persistence: A persistent link to a landing page with metadata and access informa4on. Principle 7: Version and granularity (e.g. a version number or 4mestamp) In addi4on, access to versions or subsets should be available from the landing page.

Roadmap for! Scientific Publishers! Participants: Elsevier, SpringerNature,! elife, PLOS, Frontiers, Wiley, and others! Elsevier SpringerNature In-person Workshop July 22, 2016 in London, hosted by SpringerNature, continuing work via Telcons.! Roadmap as a set of actions for different steps of research article: pre-submission, submission, production, and publication.! Cousijn, H., Kenall, A., Ganley, E., Harrison, M., Kernohan, D., Murphy, F., Clark, T. (2017). A Data Citation Roadmap for Scientific Publishers, BiorXiv. https://doi.org/10.1101/100784!

Publishers Roadmap! Specify how to format Data Citations!! There are several ways data can be linked from ( cited in) scholarly articles:!! reference lists,! data availability statements and! in-text mention of accession numbers.! While a globally unique, machine actionable persistent identifier is needed for all three scenarios, citation metadata (authors, title, publication date, etc.) are specifically recommended for reference lists.!! From: https://doi.org/10.1101/100784!

Leads: Martin Fenner (DataCite), Merce Crosas (Dataverse)! Christian Haselgrove Philipe Rocca-Serra Ian Fore Roadmap for Data Repositories! Andy Jenkinson

Repositories Roadmap! Approach & Status! Focus on how data repositories can provide metadata required for data citation to manuscript authors and publishers. Data repository landing page plays central role.!! In-person workshop June 22, 2016 in San Diego.!! Eleven recommendations published in December 2016. Now collecting feedback and endorsements for implementation.!! Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., Clark, T. (2016). A Data Citation Roadmap for Scholarly Data Repositories, BiorXiv. https:// doi.org/10.1101/097196!

Repositories Roadmap! Building Blocks! Persistent Identifier Landing Page Documentation and Support Metadata schema.org

Recommendations: Required! 1. All datasets intended for citation must have a globally unique persistent identifier that can be expressed as unambiguous URL.! 2. Persistent identifiers for datasets must support multiple levels of granularity, where appropriate.! 3. This persistent identifier expressed as URL must resolve to a landing page specific for that dataset.! 4. The persistent identifier must be embedded in the landing page in machine-readable format.! 5. The repository must provide documentation and support for data citation.!

Globally Unique! Persistent Identifier! Persistent method for identification: Metadata must persist even beyond the data it describes! Machine actionable: PID resolvable as an HTTP URI! Globally Unique: Must use a prefix if ID only unique within a database! Widely used by a community: For example, in life sciences accession numbers (not DOIs) are widely used.!

! Multiple Levels of Granularity! Support citation of a specific version, as well as citation of unspecified version! In some cases, data is uniquely identified as a collection of many items (example in next slide)!

https://doi.org/10.18116/c6h02x

Persistent Identifier resolves to! Landing Page! 1 2 3 Article Landing Page Data Expectation is that by default the persistent identifier as URL resolves to a human- and machine-readable page with more information. Use content negotiation to resolve the persistent identifier as URL to machine-readable metadata, or to the content itself.

Persistent Identifier embedded in the Landing Page! Bibliographic Repository Data Repository Article 1 Landing Page 2 3 Data Finding citation a Data reference PID c b Human readable metadata Machine readable metadata article PID citation metadata data PID d 1 1 0 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 0 0 Human Readable Machine Readable <meta name="dc.identifier" content= "https://doi.org/10.15785/sbgrid/179" />

Documentation and Support! The repository must provide documentation about how data should be cited, how metadata can be obtained, and who to contact for more information.! The DCIP FAQ Expert Group has generated example documentation for data repositories, which will be provided on a dedicated website.!

Recommendations: Recommended! 6. The landing page should include metadata required for citation and ideally also metadata helping with discovery in human-readable and machine-readable format.! 7. The machine-readable metadata should use schema.org markup in JSON-LD format.! 8. Metadata should also be made available via HTML meta tags to facilitate use by reference managers.! 9. Metadata may be made available for download in Bibtex or other standard bibliographic format.!

Citation Metadata!

Metadata on Landing Pages:! For Humans!

Metadata on Landing Pages: Machine-Readable! The landing page should include metadata required for citation and ideally also metadata helping with discovery. Important for persistent identifiers that don't provide citation metadata in a central searchable index. Less important if metadata also available elsewhere in standardized workflow, e.g. DOIs. Use by reference managers to fetch citation metadata. Used by publishers to validate data citations. Used by indexers for discovery.

Metadata on Landing Pages: schema.org! Schema.org is community activity to promote structured data on the internet, started in 2011 by Google, Microsoft, Yahoo, and Yandex. Schema.org can be displayed as microdata or RDFa embedded in HTML, or via JSON-LD. The expert group recommends JSON-LD as the preferred format. Citation metadata are fully supported by schema.org (see earlier citation metadata table), several groups are extending support for more specialized metadata, including http://bioschemas.org in the life sciences. DataCite provides citation metadata in schema.org/json-ld format via DOI content negotiation.

Schema.org Example! Available via DOI Content Negotiation curl -LH "Accept: application/vnd.schemaorg.ld+json" https:/doi.org/10.18116/ c6h02x { "@context": "http://schema.org", "@type": "Dataset", "@id": "https://doi.org/10.18116/c6h02x", "additionaltype": "Imaging Data", "name": "Image collection 10.18116/C6H02X", "url": "http://iaf.virtualbrain.org/search/reconstitute/2ccda04d", "author": [{ "@type": "Person", "givenname": "JL", "familyname": "Breeze" },...

Metadata on Landing Pages:! HTML Meta Tags!

Recommendations: Optional! 9. Content negotiation for schema.org/json-ld and other content types may be supported so that the persistent identifier expressed as URL resolves directly to machinereadable metadata.! 10. HTTP link headers may be supported to advertise content negotiation options!

Content Negotiation for Machine Readable Metadata

HTTP Link Headers

Metadata in Standard Bibliographic Format

Tracking Data Citations The Research Data Alliance (RDA) Scholarly Link Exchange Working Group (Scholix) is working on a framework for standardizing the exchange of article/ data links between scholarly infrastructure providers. The group has broad community participation from repositories, publishers and service providers. If you want to participate please contact one of the co-chairs Adrian Burton (ANDS), Wouter Haak (Elsevier), Paolo Manghi (OpenAIRE), or Martin Fenner (DataCite).

Next Steps We are in the process of collecting community feedback regarding the recommendations. We have started to work with data repositories on endorsement and implementation of the recommendations. We will publish an updated recommendations paper incorporating feedback, endorsements and sample implementations.

Thanks Martin Fenner & Mercè Crosas DCIP Executive Team Tim Clark, Harvard Medical School & MGH (co-chair), Maryann Martone, Hypothesis & UCSD (co-chair), Carole Goble, University of Manchester & ELIXIR, Jeffrey Grethe, UCSD & biocaddie, Jo McEntyre, EMBL-EBI & ELIXIR, Joan Starr, California Digital Library, Martin Fenner, DataCite, Simon Hodson, CODATA, Chun-Nan Hsu, UCSD Special thanks to Tim Clark for his contributions to these slides