Similar documents
Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

Data Citation and Scholarship

RADAR A Repository for Long Tail Data

Reproducibility and FAIR Data in the Earth and Space Sciences

re3data.org - Making research data repositories visible and discoverable

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Striving for efficiency

Research Data Repository Interoperability Primer

Using DCAT-AP for research data

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

Science Europe Consultation on Research Data Management

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

OpenAIRE Open Knowledge Infrastructure for Europe

Towards repository interoperability

How to share research data

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Registry Interchange Format: Collections and Services (RIF-CS) explained

Global Data Sharing The Research Data Alliance

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Towards FAIRness: some reflections from an Earth Science perspective

How to make your data open

Implementing the RDA Data Citation Recommendations for Long Tail Research Data. Stefan Pröll

For Attribution: Developing Data Attribution and Citation Practices and Standards

RDDS: Metadata Development

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Data Management at NIST

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

Data Publishing and Data Linking Introducing SCHOLIX

DOIs for Research Data

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

Basics in good research data management (RDM) for reviewing DMPs

Digital repositories as research infrastructure: a UK perspective

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

The Virtual Observatory and the IVOA

Introduction to INSPIRE. Network Services

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Persistent identifiers, long-term access and the DiVA preservation strategy

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Reducing Consumer Uncertainty

Indiana University Research Technology and the Research Data Alliance

The European Cloud Initiative and High Performance Computing (HPC) Teratec 2016

Dataverse and DataTags

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Metadata Workshop 3 March 2006 Part 1

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Bengkel Kelestarian Jurnal Pusat Sitasi Malaysia. Digital Object Identifier Way Forward. 12 Januari 2017

Horizon 2020 Open Research Data Pilot: What is required? Sarah Jones Digital Curation Centre

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

INSPIRE & Environment Data in the EU

A Data Citation Roadmap for Scholarly Data Repositories

Basic Requirements for Research Infrastructures in Europe

Show me the data. The pilot UK Research Data Registry. 26 February 2014

From Open Data to Data- Intensive Science through CERIF

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Putting Open Access into Practice

RADAR Introduction and Basic Concepts. Matthias Razum

COAR Interoperability project and Usage Statistcs

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Curation: Technical Challenges Facing Repositories. Brianna Marshall Jan. 9, 2014

Horizon2020/EURO Coordination and Support Actions. SOcietal Needs analysis and Emerging Technologies in the public Sector

A Dublin Core Application Profile for Scholarly Works (eprints)

Conferenza GARR Moving Forward to deliver an «EOSC in Practice» by 2018

National Materials Data Initiatives

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1

Specific requirements on the da ra metadata schema

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

ESFRI WORKSHOP ON RIs AND EOSC

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

Review of OSI Report Developing the UK s e-infrastructure for Science and Innovation

DuraSpace FAIRness and GDPR

NRF Open Access Statement

Big Data infrastructure and tools in libraries

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Supporting European e-infrastructure service providers to join the European Open Science Cloud

The European Commission s science and knowledge service. Joint Research Centre

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

Contribution of OCLC, LC and IFLA

Metadata Issues in Long-term Management of Data and Metadata

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

CORE: Improving access and enabling re-use of open access content using aggregations

Metadata Models for Experimental Science Data Management

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Building a Europe of Knowledge. Towards the Seventh Framework Programme

Transcription:

http://cdn.zmescience.com/wp-content/uploads/2009/08/jellyfish_nom.jpg

The mission I reveiced Present main directions on Open Science along with main issues or areas that may need policy actions. E-IRG: http://e-irg.eu/workshop-2015-06-programme

Take Home Message Open Science needs to be on the ground and in the people, in 10.000s of labs around the world, in 100.000s of projects, performed by 1.000.000s of researchers. Big Data is only one and maybe even a misleading approach to implement Open Science The key to optimally exploit the capacity of Open Science lies in the Long Tail of Research

EC DG RTD Burgelman on Science 2.0... The Long Tail, i.e. FigShare, Dryad, My Experiment etc as the basis of Open Science

... different meanings of Long-Tail This slide denotes Long-Tail as the negilible rest of scientific disciplines

Some conceptual Questions Does infrastructure investment correlate with excellence in research? Does Big Data equal Big Science? Can you show me a Nobel Price based on Big Data?

Big data is all the rage

Where is Big Data now?

Where is Big Data now? Internet of Things Gartner Hype Cycle Technologies 2014 http://www.gartner.com/newsroom/id/2819918 Data Science Big Data Content Analytics

Data are representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. (Christine Borgmann 2014)

(Pepe et al. 2009)

Beyond Big Data...beyond the trendy discussion of big data to focus on the real issue: data the very concept of which differs among scholarly communities... J.L. King, University of Michigan on the book: Christine L. Borgman. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press, 2015.... the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data Adam R. Ferguson, Jessica L. Nielson, Melissa H. Cragin, Anita E. Bandrowski, Maryann E. Martone: Big data from small data: data-sharing in the 'long tail' of neuroscience Nat Neurosci, Vol. 17, No. 11. (28 November 2014), pp. 1442-1447, doi:10.1038/nn.3838

Long-Tail as in Economics Chris Anderson (Editor in Chief), Wired, Issue 12.10, October 2004 http://www.wired.com/wired/images.html?issue=12.10&topic=tail&img=2

Long-Tail as in Research Data P. Bryan Heidorn (LIS U Arizona) in Library Trends 57/2, Fall 2008 While great care is frequently devoted to the collection, preservation and reuse of data on very large projects, relatively little attention is given to the data that is being generated by the majority of scientists. There may only be a few scientists worldwide that would want to see a particular boutique data set but there are many thousands of these data sets. The long tail is a breeding ground for new ideas and never before attempted science. The challenge for science policy is to develop institutions and practices such as institutional repositories, which make this data useful for society. http://muse.jhu.edu/journals/library_trends/v057/57.2.heidorn.pdf

Larger parts of research use small data The 2011 survey by Science, found that 48.3% of respondents were working with datasets that were less than 1GB in size and over half of those polled store their data only in their laboratories. Science 11 February 2011: Vol. 331 no. 6018 pp. 692-693 DOI: 10.1126/science.331.6018.692 Because there is only a tiny fraction of large projects and a loooooooooooooong tail of small projects Can you see the curve? http://muse.jhu.edu/journals/library_trends/v057/57.2.heidorn.pdf

Big Data, Long-Tail Data No. Head Tail 1 Homogeneous Heterogeneous 2 Large Small 3 Common standards Unique standards 4 Regulated Not Regulated 5 Central curation Individual curation 6 Disciplinary repositories Institutional, general or no repositories Adapted from: Shedding Light on the Dark Data in the Long Tail of Science by P. Bryan Heidorn. 2008 Disks in your drawer; server in lab basement Long Tail Data exist across all disciplines

Heterogeneity A review undertaken by Cornell University of over 200 data packages (files related to arxiv papers) deposited into the Cornell Data Conservancy with there were 42 different file extensions for 1837 files across six disciplines. http://blogs.cornell.edu/dsps/2013/06/14/arxiv-data-conservancy-pilot/ The Dryad Repository, which is a curated, general-purpose repository that collects and provides access to data underlying scientific publications reports a huge diversity of formats including excel, CVS, images, video, audio, html, xml, as well as many uncommon and annoying formats. The average size of the data package which they collect is ~50 MB. http://wiki.datadryad.org/wg/dryad/images/b/b7/2013mayvision.pdf According to the European Commission (EC) document, Research Data e- Infrastructures: Framework for Action in H2020, diversity is likely to remain a dominant feature of research data diversity of formats, types, vocabularies, and computational requirements but also of the people and communities that generate and use the data. http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/frameworkfor-action-in-h2020_en.pdf

Institutional, domain or no repositories Science 11 February 2011: Vol. 331 no. 6018 pp. 692-693 DOI: 10.1126/science.331.6018.692

Some of the challenges Data quality - appraise and show data as scientific / institutional /societal asset - push standards for metadata and technology across disciplines Discoverability - increase discoverability in diverse repositories Incentives - show researchers how easy and beneficial it is to deposit data - ask funders and institutions about policies Business case - show problems of irreproducibility, double research & innovation loss

Long Tail of Research Data Interest Group Accepted as an RDA Interest Group in Summer 2013 150 members from around the world Objectives To better understand the long tail To address challenges involved in managing diverse datasets To share and develop practices for managing diverse data To work towards greater interoperability across repositories Thanks for the slides, Kathleen! Kathleen Sheerer, COAR Executive Director and Co- Chair of the RDA IG

Long Tail of Research Data Interest Group Activities-to-date Survey of discovery metadata Discussion of strategies for improving discoverability of datasets All information is available on the interest group s website Future activities evidence to incentivize researchers to deposit make it easier for researchers to deposit their data sharing practices about discovery interoperability across repositories (WG!) preservation planning

RDA Work, e.g. Discoverability Better understand practices with discovery metadata Respondents: any repository collecting long tail data Undertaken from February 15 to March 7, 2014; Recruited respondents via RDA mailing list and other research data list serves; Over 60 responses, but only 30 full responses OBVIOUSLY not representative but indicative

What are the descriptive metadata standards used? Repositories using a single schema Dublin Core (9) DataCite (3) DDI Study-level metadata cf supra. ISO19115 (Geographic Information Metadata) MARC21 MODS metadata RIF-CS Repositories using more than one schema DataCite and Dublin Core (3) Dublin Core, Darwin Core, Prism Dublin Core, EDM, ESE, QDC Dublin Core, MARC21 dc, dcterms, geo/wgs84, FOAF, own extension ontology MODS & DataCite Metadata Schema Organic.Edunet IEEE LOM

In your opinion, is the metadata used in the repository sufficient to ensure discoverability of the datasets? 88% said yes, but Broadly speaking, and at a very high level, yes. If someone is looking for the data that supports a specific study, it is likely they will find it. However, if someone is looking for data with specific collection characteristics or other particularities then the metadata requires further enhancement. We aim to index metadata to aid discovery only. Metadata required to explore / reuse data will be stored with the data as a (non-indexed) object or stored in a separate, searchable database which links to the individual data objects in the repository (which may be at a sub-collection level). Data will also be found as the DOI will be included in publications related to the dataset.

In your opinion, is the metadata used in the repository sufficient to ensure discoverability of the datasets? 88% said yes, but Data are discoverable within the repository because of limited repository scale, but once harvested and made available to search alongside tens of thousands of other datasets, the metadata are insufficient Precision is low because natural language metadata queries tend to entrain marginally relevant data sets due to weak associations in project descriptions and other broad fields. Fine for basic discoverability - richer discipline metadata would be nice but probably not feasible at this point

And we know, most most people use Google as their discovery tool

Improving access to datasets Open licenses, funders and institutional policies Link data to publications, e.g. Force11, OpenAIRE Persistent Identifiers, e.g. DataCite-DOIs Discovery layer, e.g. landing pages, LOD, Schema.org Enable machine readability, e.g. APIs Dataset and repository registries, e.g. re3data.org

Some Long-Tail Data Activities RDA Research Data Alliance, Long-Tail Regular Meetings > WWW LIBER Assoc. Europ. Reseach Libraries 10 recommendations and case studies > WWW COAR Confederation of Open Access Repositories Repository Interoperability Roadmap > WWW DRIVER/OpenAIRE EU projects linking Literature to Data > WWW LERU League of Europ. Research Universities Roadmap for Research Data > WWW

Policy Actions for Long-Tail Data No. Long-Tail Characteristic Do s Don t s 1 Heterogeneous Let a 1000 Flowers bloom and embrace diversity Restrict innovation capacity by prescriptive normalisation 2 Small Provide scale-to-size solutions Enforce Big Data potpourri 3 Unique standards Increase awareness and allow smart standard customization Ignore the context-specific expertise of data-management 4 Not regulated Stress open licenses in publishing & career incentives Limit scientific and economic exploitation or personal rights 5 Individual curation Support personalization and ownership transfer pipelines Weaken the responsibility of data producers or researchers 6 Institutional, general or no repositories Support interoperable institutional repositories in a global research data network Fight distributed approaches and assume that only my data centre knows how to manage data

Conclusions The Long Tail of Research as I mean it is the where the majority of innovation, citation, and data is generated Simplification to Big Data and Big Science bears high risks of not using Europe s research capacity Diversity is the main challenge but also the way research in organizing itself The institutional perspective needs to be fostered

--- and do not forget that the innovation is in the people THANK YOU