Dataverse and DataTags

Similar documents
Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

DATA SHARING FOR BETTER SCIENCE

DATAVERSE FOR JOURNALS

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Science Panel Discussion presentation: "A Data Sharing Story"

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Dataverse 4.0 & Beyond. Eleni Castro > Ins/tute for Quan/ta/ve Social Science (IQSS), Harvard University

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Fair data and open data: differences and consequences

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Demos: DMP Assistant and Dataverse

DOIs for Research Data

Reproducibility and FAIR Data in the Earth and Space Sciences

A Data Citation Roadmap for Scholarly Data Repositories

Data Citation and Scholarship

The UiT research data web portal. uit.no/forskningsdata

Helping Journals to Upgrade Data Publications for Reusable Research

Securing Dataverse with an Adapted Command Design Pattern. Gustavo Durand, Michael Bar-Sinai, Merce Crosas SecDev - September 26, 2017

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

5/16/2018. Researcher Challenges with Data Use. AGU s position statement on data affirms that

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

State of the Art in Data Citation

Making data publication a first class research output

SHARING YOUR RESEARCH DATA VIA

How to make your data open

Developing a Research Data Policy

Indiana University Research Technology and the Research Data Alliance

A Data Sharing System

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Open Data and its enemies

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Dataverse: Modular Storage and Migration to the Cloud

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

How to share research data

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

Astronomy Dataverse: enabling astronomer data publishing.

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

For Attribution: Developing Data Attribution and Citation Practices and Standards

Towards a joint service catalogue for e-infrastructure services

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Striving for efficiency

Data Citation. DataONE Community Engagement & Outreach Working Group

Data Curation: Technical Challenges Facing Repositories. Brianna Marshall Jan. 9, 2014

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

RADAR A Repository for Long Tail Data

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Data management Backgrounds and steps to implementation; A pragmatic approach.

The library s role in promoting the sharing of scientific research data

Research Data Repository Interoperability Primer

First Light for DOIs at ESO

Open Science, FAIR data and effective data management

Personal Digital Information Project, Part 2: Hands-on Exercise

GEOSS Data Management Principles: Importance and Implementation

Executive Committee Meeting

Digital repositories as research infrastructure: a UK perspective

DuraSpace FAIRness and GDPR

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA

Checklist and guidance for a Data Management Plan, v1.0

Open Access to Publications in H2020

Data Discovery - Introduction

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

Launching the. Data Curation Network NDS/MBDH 2018

National Materials Data Initiatives

Research Elsevier

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

DOI for Astronomical Data Centers: ESO. Hainaut, Bordelon, Grothkopf, Fourniol, Micol, Retzlaff, Sterzik, Stoehr [ESO] Enke, Riebe [AIP]

Basics in good research data management (RDM) for reviewing DMPs

DataBridge: CREATING BRIDGES TO FIND DARK DATA. Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES. The Team

Using Persistent Identifiers at

ZB MED Information Center Life Sciences

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

UC Irvine LAUC-I and Library Staff Research

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

Data publication and discovery with Globus

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

re3data.org - Making research data repositories visible and discoverable

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

The Materials Data Facility

bwfdm Communities - a Research Data Management Initiative in the State of Baden-Wuerttemberg

Introduction to INEXDA s Metadata Schema

Data Exchange in the Earth Sciences

Executive Committee Meeting

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Web Sources. Data Versioning Use Cases. W3C Data on the Web Best Practices. W3C Dataset Exchange Use Cases and Requirements.

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Science Europe Consultation on Research Data Management

Specific requirements on the da ra metadata schema

Big Data infrastructure and tools in libraries

Open Access & Open Data in H2020

RDM, a view from Vancouver

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography


Transcription:

NFAIS Open Data Fostering Open Science June 20, 2016 Dataverse and DataTags Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitive Social Science Harvard University @mercecrosas

Research data publishing is the release of research data, associated metadata, accompanying documenta8on, and so9ware code (in cases where the raw data have been processed or manipulated) for re- Research data publishing is the release of research data, associated metadata, accompanying documenta8on, and so9ware code (in cases where the raw data have been processed or manipulated) for re- use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way. use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way. RDA Data Publishing Workflows Working Group; 10.5281/zenodo.34542

Data Publishing is sharing data that are: Findable Accessible Interoperable Reusable

Why publish data? Researchers Get credit for their data Publishers and Journals Verify published work Federal funding agencies Make public assets public Science Validate, reuse and extend previous work

Ways of Publishing Data Journal s data policy Scholarly Article Data in Repository Data Descriptor or Data Paper Data in Repository Scholarly Article Published Dataset in Repository Scholarly Article

A data repository system for sharing and archiving research data A Solution for Publishing FAIR research data: Findable, Accessible, Interoperable, Reusable

http://dataverse.org Created and developed at Harvard s Institute for Quantitative Social Science Harvard Dataverse: Generic data repository open to researchers world wide http://dataverse.harvard.edu

Dataverse Today: A growing Community Dataverse Project: Dataverse installations:19; serving > 200 Universities User Community group: 294 members Open-source software: 29 contributors Dataverse Community Meeting (July, 2016):107 registered, so far Twitter: 2940 followers Harvard Dataverse Repository: Registered users: 13,795; 300 new per month Dataverses: 1,677; 50 new per month Journal Dataverses: 91 Datasets: 61,781; 400 new per month Data Files: 330,462; 3,000 new per month

Dataverses contain datasets or dataverses Datasets contain metadata and data files

Dataverse follows best practices for FAIR Data Publishing

Best Practices Data Citation Metadata Access Control and Rules APIs and Standards Reference, locate and attribute Discover and reuse Access protecting privacy Interoperate

Data Citation in Dataverse Authors Published Year Dataset Title Global Persistent Identifier Repository = Data Publisher Version (or time range)

Data Citation Basics The dataset landing page is accessible and guaranteed by the repository (data publisher), even when data are restricted or deaccessioned Force11, Joint Declaration of Data Citation Principles, 2014; Starr et al, 2015

Metadata in Dataverse Metadata Level Fields Standards Citation Metadata Domain-specific Metadata File-level Metadata author, title, repository, year published, version, etc data collection info (methods, organism, observation, survey, experiment, etc) metadata inside the data file (variables, instrument details, geospatial info, etc) Dublin Core DataCite DDI (social sciences) ISA-Tab BioCaddie (biomed) Virtual Observatory (astro) + Custom metadata blocks DDI (for variables), + more to be determined Dataverse JSON Schema

Tiered Access Metadata Files How to Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Terms of Use Open Open Data Restricted Open Restricted Data Restricted Open Restricted Fill in guestbook before download Click through terms of use before download Request Access via click through Request Access via application

Data Publishing Workflows Create Dataset (landing page restricted) Review (collaborators or anonymous review) Publish v. 1 Minor change (metadata only) Publish v. 1.1 Major change (might include new data file) Publish v. 2

Learn more at dataverse.org guides

Current Research Grants Privacy tools to share sensitive data Data provenance Social Science Big Data Journal articles connected to data Data Privacy Biomedical largescale data

How can we maximize data publishing of sensitive data while being mindful of privacy?

The DataTags System Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The DataTags System. Technology Science. 2015101601. October 16, 2015. http://techscience.org/a/2015101601

A datatag is a set of security features and access requirements for file handling. A datatags repository is one that stores and shares data files in accordance with a standardized and ordered levels of security and access requirements

Datatags&Levels& Tag$Type$ Descrip-on$ Security$Features$ Access$Requirements$ Blue$ Public& Clear&storage& Clear&transmission& Green$ Controlled$ public& Clear&storage& Clear&transmission& Yellow$ Accountable& Clear&storage& Encrypted&transmit& Orange$ More$ accountable& Encrypted&storage& Encrypted&transmit& & Open& Email,&OAuth&verified& registra:on& Password,&Registered&,& Approval,&Click&DUA& Password,&Registered,& Approval,&Signed&DUA& Red$ Fully$ accountable& Encrypted&storage& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA& Crimson$ Maximally$ restricted& Mul:Encrypt&store& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA&

DataTags Workflow in a Dataverse Repository (under development) Automa-c$ Interview$$ Review$Board$ Approval$ Direct$ Access$ Data$File$ Inges-on$ Sensi-ve$ Dataset$ Two-factor Authentication; Signed DUA http://datatags.org http://privacytools.seas.harvard.edu Privacy$ Preserving$ Access$

Example of DataTags Interview: A sequence of questions from an expert system

Example of DataTags Interview: Final datatag human-readable and machine-actionable policy

Summary Data sharing is good for researchers, journals, funding agencies, and science Dataverse is an open-source software for building data repositories to share research data Data citation and rich metadata support are key to Dataverse, and enable FAIR data publishing Dataverse also supports tiered access to data and data publishing review and versioning workflows DataTags generates human-readable and machine-actionable policies to support sensitive datasets in data repositories

Join us to this year s Dataverse Community Meeting

References @mercecrosas and http://scholar.harvard.edu/mercecrosas http://dataverse.org http://dataverse.harvard.edu http://datatags.org Wilkinson, et al, 2016, The FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data Altman, Borgman, Crosas, Martone, 2015, An Introduction to the Joint Data Citation Principles, Bulletin of the Association for Information Science and Technology Starr et al, 2015, Achieving Human and Machine Accessibility of Cited Data in Scholarly Publications, PeerJ Computer Science Meyer et al, 2016, Data Publication with the Structural Biology Grid Supports Live Analysis, Nature Communications Sweeney, Crosas, Bar-Sinai. 2015, Sharing Sensitive Data with Confidence: The DataTags System. Technology Science