Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Similar documents
Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

TEXT MINING: THE NEXT DATA FRONTIER

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

How to make your data open

EMBL-EBI Patent Services

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

Data Discovery - Introduction

OpenAIRE Open Knowledge Infrastructure for Europe

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

DOIs for Research Data

How to share research data

New generation of patent sequence databases Information Sources in Biotechnology Japan

The Science International Accord on Open Data in a Big Data World and the IUCr's response

Reproducibility and FAIR Data in the Earth and Space Sciences

EUDAT- Towards a Global Collaborative Data Infrastructure

Research Elsevier

Ideas to help making your research visible

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Developing a Research Data Policy

I data set della ricerca ed il progetto EUDAT

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

A Data Citation Roadmap for Scholarly Data Repositories

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Facilitating Semantic Alignment of EBI Resources

EBI services. Jennifer McDowall EMBL-EBI

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Introducing the Springer Nature Data Support Services

The ELIXIR of Linked Data

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Science Europe Consultation on Research Data Management

Data Curation Handbook Steps

Data Curation Profile Cornell University, Biophysics

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EBI patent related services

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

GEOSS Data Management Principles: Importance and Implementation

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

The PDB and experimental data

Data Curation Profile Human Genomics

Open Access, RIMS and persistent identifiers

State of the Art in Data Citation

The Materials Data Facility

Making data publication a first class research output

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Data Curation Profile Botany / Plant Taxonomy

SHARING YOUR RESEARCH DATA VIA

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

CrossRef developments and initiatives: an update on services for the scholarly publishing community from CrossRef

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Data Curation Profile Movement of Proteins

Digital repositories as research infrastructure: a UK perspective

The European Variation Archive

Data management Backgrounds and steps to implementation; A pragmatic approach.

RADAR A Repository for Long Tail Data

Robin Wilson Director. Digital Identifiers Metadata Services

Integrated Access to Biological Data. A use case

Open Science Commons: A Participatory Model for the Open Science Cloud

National Centre for Text Mining NaCTeM. e-science and data mining workshop

Data Citation and Scholarship

DATA SHARING FOR BETTER SCIENCE

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Integrating Data with Publications: Greater Interactivity and Challenges for Long-Term Preservation of the Scientific Record

The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data

OPEN DATA. Dr Arthur Smith Dr Marta Teperek 15/06/2015. Slides: University of Cambridge

Managing your data. Niclas Jareborg, NBIS

Indiana University Research Technology and the Research Data Alliance

For Attribution: Developing Data Attribution and Citation Practices and Standards

Towards FAIRness: some reflections from an Earth Science perspective

Dataverse and DataTags

How to store and visualize RNA-seq data

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Standards in Science Publishing: Persistent Person Identifiers. CSE Meeting, Montreal 5 May 2013

Data Citation Technical Issues - Identification

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

National Materials Data Initiatives

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

INTEGRATION OPTIONS FOR PERSISTENT IDENTIFIERS IN OSGEO PROJECT REPOSITORIES

EUDAT. Towards a pan-european Collaborative Data Infrastructure

NRF Open Access Statement

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

PROJECT FINAL REPORT. Tel: Fax:

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

Open Science, FAIR data and effective data management

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION

Text mining tools for semantically enriching the scientific literature

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Welcome - webinar instructions

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Advancing code and data publication and peer review. Erika Pastrana, PhD Executive Editor, Nature Journals ALPSP_Sept 2018

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Deliverable D4.3 Release of pilot version of data warehouse

I. Background and objectives

Guidelines for Depositors

Making the most of metadata with Metadata 2020

Transcription:

Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk

About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit research institute Europe s hub for biological data services and research

A stable, freely available, shared repository Europe PMC Abstracts: 30 million Full-text articles: 3 million Article citation counts Grants ORCIDs Semantic annotation Data citations Data integration Europe PMC is a member of the PMC International Collaboration. Funded by 26 European funders of life science research

Degrees of Data Unstructured/semistructured A picture of a graph A spreadsheet of my results A narrative with citations, pictures and attachments Article Metadata Structured A record in a DNA sequence database A graphical display of a genome Added Value

The scent of information retaining files and being prepared to share them accessible data It's healthy to remember that users are selfish, lazy, and ruthless in applying their cost-benefit analyses Information Foraging: by JAKOB NIELSEN, June 30, 2003 http://www.nngroup.com/articles/information-scent/ Data placement: where people will find it Scent alone is not scientific

1. Discoverability through accessibility Deposit in a public/open database Where possible, structured archive (e.g. PDB, ENA) >> unstructured archive (e.g. Zenodo, Figshare) Uniquely identify it: PID, Accession number, DOI, ROI Give it context: metadata (and more) All of the above = citable =

Data Citation Principles Data as legitimate, citable products of research Attribution and credit Cited as evidence for a claim Unique identification Access Persistence Specificity and verifiability (provenance) Interoperability and flexibility https://www.force11.org/datacitation

2. Discoverability through structured data structured data is one of the true enablers of life science - Discovery of homology between genes across species - Predicting function based on protein folds Structured data can be cross-analysed, compared by algorithm, and encourages development of new products and tools

Discoverability through structured data

Metadata critical to discoverability Generic: title, submitters, date, file format, version. citation basic search Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1 gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006 (Rel. 89, Last updated, Version 7). BN000065 Specific: organism, tissue, assay, page number deep search analysis computation

Structured data is good value for money Annual cost of generating new protein structure data in labs around the world Annual cost of maintaining it in a central database

3. Discoverable through added value In-built QA: otherwise you couldn t do this!

Literature-Data Integration

Quality control and trustworthiness Pre publication QA: a range of activities from is this file valid to is it described adequately/richly by metadata? Post publication Data are immutable: time and repetition required to show outliers. Openness: enabling assessment & reuse Provenance & connectivity with related data, methods, code, tools, articles

Text and Data Mining (TDM) The more structure there is the more reusable data becomes TDM builds bridges between unstructured (including articles) and structured data worlds Such analyses can drive adoption of more structure. Open Access infrastructure is a critical enabler Graphic kindly supplied by the National Centre for Text Mining (NaCTeM), UK

BioStudy database for unstructured data EBI BioStudy Metadata Study Data files Ontologies Other DBs Publications Other DBs

Making data discoverable provide tools to help researchers use it Labs around the world deposit data and we Analyse, add value and integrate it Share it with other data providers A collaborative enterprise Archive it Classify it

Elixir: An international distributed infrastructure for Data Standards Tools Compute Training Industry

Some of many Open Questions What can stakeholders do to capitalize on the research investment? Incentives and credit for good data management Can all data outputs to be peer reviewed (like articles)? How far does the article model apply to data? Negative results? What are we prepared to spend: costs benefits? How does this apply to big data? What is the public interest in discoverable data?