Research Data Management: lessons learned - and still to learn

Similar documents
Developing a Research Data Policy

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

Digital The Harold B. Lee Library

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Igitur Archive: Institutional Repository Utrecht University. May , Martin Slabbertje

Assessment of product against OAIS compliance requirements

Assessment of product against OAIS compliance requirements

Edward M. Corrado and Sandy Card: ELUNA 2011

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Long-term digital preservation of UNSWorks

NRF Open Access Statement

Queen s University Library. Research Data Management (RDM) Workflow

Data Management Checklist

Research partnerships, user participation, extended outreach some of ETH Library s steps beyond digitization

Persistent identifiers, long-term access and the DiVA preservation strategy

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Copyright 2008, Paul Conway.

Digital preservation activities at the German National Library nestor / kopal

User Stories : Digital Archiving of UNHCR EDRMS Content. Prepared for UNHCR Open Preservation Foundation, May 2017 Version 0.5

Deposit guide for Nottingham eprints

Comparing Open Source Digital Library Software

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

The OAIS Reference Model: current implementations

Navigating the Universe of ETDs: Streamlining for an Efficient and Sustainable Workflow at the University of North Florida Library

Ex Libris Rosetta A Complete Digital Asset Management and Preservation System

Electronic Submission to UMI using FTP

Protecting Future Access Now Models for Preserving Locally Created Content

Building for the Future

Data Curation Profile Movement of Proteins

Increasing access to OA material through metadata aggregation

What do you do when your file formats become obsolete? Lydia T. Motyka Florida Center for Library Automation USETDA 2011

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Preservation Standards (& Specifications) (&& Best Practices)

Records management workflows

Survey of Existing Services in the Mathematical Digital Libraries and Repositories in the EuDML Project

DAITSS Demo Virtual Machine Quick Start Guide

Session Two: OAIS Model & Digital Curation Lifecycle Model

Data management Backgrounds and steps to implementation; A pragmatic approach.

The Ohio State University's Knowledge Bank: An Institutional Repository in Practice

Demos: DMP Assistant and Dataverse

Callicott, Burton B, Scherer, David, Wesolek, Andrew. Published by Purdue University Press. For additional information about this book

Working with Islandora

PRESERVING DIGITAL OBJECTS

Introducing the Springer Nature Data Support Services

PubMan Workshop. This work is licensed under a Creative Commons Attribution 3.0 Germany License

Collection Policy. Policy Number: PP1 April 2015

Questionnaire for effective exchange of metadata current status of publishing houses

Dryad Curation Manual, Summer 2009

RESOURCE DISCOVERY PAST WORK AND FUTURE PLANS

Open Access & Open Data in H2020

FDA Affiliate s Guide to the FDA User Interface

Evolving the digital library for digital scholarship enablement

DALA Project: Digital Archive System for Long Term Access

Breakout session Save The Data Research Data Management How it s done at Saxion University of Applied Sciences

The Necessity of a New Culture of Electronic Publishing C A S L I N

Adding Research Datasets to the UWA Research Repository

Introduction to. Digital Curation Workshop. March 14, 2013 SFU Wosk Centre for Dialogue Vancouver, BC

How to contribute information to AGRIS

The Data Curation Profiles Toolkit: Interview Worksheet

Interoperability & Archives in the European Commission

Preservation and Access of Digital Audiovisual Assets at the Guggenheim

DMPonline / DaMaRO Workshop. Rewley House, Oxford 28 June DataStage. - a simple file management system. David Shotton

Advancing code and data publication and peer review. Erika Pastrana, PhD Executive Editor, Nature Journals ALPSP_Sept 2018

RADAR Introduction and Basic Concepts. Matthias Razum

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

User guide. Created by Ilse A. Rasmussen & Allan Leck Jensen. 27 August You ll find Organic Eprints here:

Its All About The Metadata

RADAR A Repository for Long Tail Data

From Individual Solutions to Generic Tools Digitization at the Max Planck Society. Digitization Day 2012, Geneva Andrea Kulas

Archival Information Package (AIP) E-ARK AIP version 1.0

NDSA Web Archiving Survey

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Writing a Data Management Plan A guide for the perplexed

Getting Bits off Disks: Using open source tools to stabilize and prepare born-digital materials for long-term preservation

Article begins on next page

Digital Preservation Efforts at UNLV Libraries

Implementation of the Data Seal of Approval

Promoting Open Standards for Digital Repository. case study examples and challenges

Implementation of the Data Seal of Approval

Digital Preservation DMFUG 2017

Your Open Science and Research Publishing Platform. 1st SciShops Summer School

ISO Self-Assessment at the British Library. Caylin Smith Repository

The Knowledge Portal, or, the Vision of Easy Access to Information

A Model for Managing Digital Pictures of the National Archives of Iran Based on the Open Archival Information System Reference Model

NEW YORK PUBLIC LIBRARY

Optional Thesis Deposit

Agenda. Bibliography

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

ECP-2008-DILI EuropeanaConnect. D5.7.1 EOD Connector Documentation and Final Prototype. PP (Restricted to other programme participants)

Online Catalogue and Repository Interoperability Study (OCRIS)

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

ScholarOne Manuscripts. Production Center User Guide

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1

Common approaches to management. Presented at the annual conference of the Archives Association of British Columbia, Victoria, B.C.

PASIG Directions & Issues

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

UWDCC Case Study. Supplementing DPOE Workshop 6 June Steven Dast Digital Asset Librarian UW Digital Collections Center

Transcription:

Research Data Management: lessons learned - and still to learn SWITCH Research Data Management (RDM) Workshop, 15. Dezember 2014 Dr., ETH-Bibliothek, ETH Zürich 15.12.2014 1

Overview Digital Curation Office Background Survey Research Data Aims Services ETH Data Archive Workflow «Small Data» with docuteam packer 15.12.2014 2

Digital Curation Office Team within Customer Services Department 2.7 FTE (+1 per 2015) plus 1.8 FTE in Library IT Services Initiated as a project in 2010/2011 after years of observation Productive operation since 2014 (some services earlier) Original approach rooted in digital preservation and first studies focused on library content (master files from digitization, but also e- journals) Today much more comprehensive understanding of services in digital curation and data management and focus is on research data 15.12.2014 3

Vision for diverse data types 15.12.2014 4

Currently not within scope Management of «active» research data which is being worked on «Big Data» in the sense of large unstructured heaps of data which need to be processed regularly Ensuring Post Cancellation Access and Preservation of licensed content (e-journals and e-books) Reason: content is not unique to ETH Zurich and can therefore be preserved in collaborative initiatives ETH-Bibliothek is member of both Portico and Global LOCKSS Network Membership is managed by colleagues from licensing Selection of titles in LOCKSS is managed by colleagues from perodicals 15.12.2014 5

Idealised life-cycle of research data Hypothesis / Research ques5on Re- use Data capture or - collec5on Access and verifica5on Digital Research Data Analysis and interpreta5on Publica5on Synthesis 15.12.2014 6

Required Retention Periods? 2. Part - How long a period do you or your research group have in mind for storing data? PHYS BIOL INFK ITET MAVT MTEC MATL BAUG BSSE MATH CHAB ERDW GESS ARCH UWIS AGRL 0 20 40 60 80 100 % >10 years 10 years Survey and diagrams: S. Scheid 15.12.2014 7

Formats in use? 15.12.2014 8

Type of data? 15.12.2014 9

Aims Facilitate re-use Facilitate the verifiability of results Support the publishing of research data Support guidelines for good scientific practice Offer DOI-registration as part of the service Keep published data permanently available Where possible and reasonable: plan active preservation measures (format migration) Offer low-barrier solution(s) for safeguarding data for limited periods of time Work closely with IT Services on coordination and delimitation Support data management Gain know-how and establish ETH-Bibliothek as service provider in this context 15.12.2014 10

Our services Advice on data management (under construction) ETH Data Archive (Ex Libris Rosetta) Safeguarding data for limited periods of time (at least 10 years) Permanent safeguarding Preservation measures (format migration) Mass processes and individual objects docuteam packer Viewer and editor for locally created file structures with metadata Preparation of data for submission to ETH Data Archive 15.12.2014 11

Access Rights and Retention Periods Access Rights: Open Access Retention Periods: Unlimited Restricted Access User ID Limited, with «automatic» deletion User Group IP Range After a defined date or relative to a date field in the metadaten Embargo Period depending on a defined date or relative to a date field in the metadata 15.12.2014 12

The (very) abstract view «Well defined data sources» Ingest ETH Data Archive (Rosetta) Ingest Manage Preserve Publish Deliver 15.12.2014 13

Rosetta as part of the systems landscape Producers Pre-Ingest Consumers DOI-Registrierung Forschende Manueller Upload Hochschularchiv Docuteam Packer / Feeder ETH Data Archive (Rosetta) Wissensportal Archivdatenbank Repositorien Submission Application Repositorien Bibliotheksinhalte Storage (Illustration: Franziska Geisser) 15.12.2014 14

Methods for Deposit and Upload Manually: Web dialogue for upload and metadata capture Semi-automatically: Batch-Upload of files with existing metadata (CSV) Automatically: Adapted Submission Application packages structured files with existing metadata in XML-format Submission Application can also be implemented as interface with an existing source application Automatically after manual preparation in docuteam packer (local viewer and editor for file-structure and metadata) 15.12.2014 15

Submission Application E-rara Submission App. Rosetta SIP e-rara ZIP-Kapsel Aleph Record PDF File E-Collection Dissertation E-Collection Submission App. Files Docuteam Packer Workspace Docuteam Feeder IE.xml (Illustration: Franziska Geisser) 15.12.2014 16

Submission Application e-rara ZIP-Kapsel Aleph Record PDF File E-Collection Dissertation Docuteam Packer Workspace Submission Application Packages files from a source system Processes source metadata (mapping to Dublin Core) Generates the directory structure required by Rosetta (> Collections) Generates a Rosetta Ingest- METS Rosetta SIP Files IE.xml (Illustration: Franziska Geisser) 15.12.2014 17

Publishing und Delivery «Discovery» is performed outside of Rosetta: OPAC / Primo Archival database Other Content Management System «Publishing» via OAI-PMH, SRU Request or Web Service «Delivery» from Rosetta: depending on defined Access Rights out of the box or additional viewers (plug-ins) 15.12.2014 18

Access via Knowledge portal (Primo) 15.12.2014 19

Link into ETH Data Archive (Rosetta) 15.12.2014 20

«View» in Rosetta: Images 15.12.2014 21

Typical customer requests «I want to submit a manuscript for an article. The editor demands that raw data is deposited in a repository. What can I do?» «Up to now we have been archiving data to our doctoral theses on CD-ROM and we would like to find a better solution. How can we do this? -- Oh, and our professor is retiring in n months.» «We want to link from an article research data which we would like to be publicly available. We have analysed the data with our own methods and now other groups might want to look into them with their methods. How can we do this?» 15.12.2014 22

Productive Use Cases Dark Archive for ETH E-Collection (ETH-Bibliothek): Periodic ingest into ETH Data Archive Data cited within a publication: Data is openly available (Example from D-ITET: http://dx.doi.org/10.5905/ethz-iis-1) «Professors Archives» (Dark Archive): Retention of data from CD-ROMs from doctoral theses for 10 or 15 years «Software Disclosure» (ETH transfer (Technology Transfer Office)): Code-packages of software which was developed at ETH ETH transfer publishes instructions for upload on its web pages ( https://www.ethz.ch/content/associates/intranet/en/research-and-technology-transfer/inventions-patents-licenses/computerprograms/software-disclosure.html) 15.12.2014 23

Use cases under preparation Research data archive of research group, prepared with docuteam packer Workflow for deposit and assessment for University Archives of ETH (docuteam packer, CMI Star) «Dark Archive» of Master files from digitisation (e-rara, e- manuscripta, e-periodica, e-pics) ETH-internal customers: Archives of Contemporary History (AfZ) Institute for the History and Theory of Architecture (gta) 15.12.2014 24

Workflow «Small Data» with docuteam packer Selection and documentation of context Curation and usability Bitstream preservation Researchers Library IT-Services docuteam packer Server Structuring and DOI-generation* Metadata capture Point of time X: «Submit» a SIP Submission docuteam feeder ETH Data Archive Storage Network Filesystem (networked, local) Selection for archiving Delivery via DOI Access via Knowledge Portal 15.12.2014 25

What is docuteam packer? For users Viewer and editor for local preparation of archival packages for transfer to ETH Data Archive Create and edit folder structure, as it should be reflected in ETH Data Archive Enter and edit metadata DOI-creation (Digital Object Identifier; to be registered by ETH Data Archive) Assign access rights and retention periods to be enforced by ETH Data Archive In the background Create a Submission Information Package (SIP) or Archival Information Package (AIP) of metadata + structure (METS-format, Metadata Encoding and Transmission Standard) and data 15.12.2014 26

The GUI and its elements (1) Statistics per element Technical metadata Tree view of folders and files Events Event details 15.12.2014 27

The GUI and its elements (2) Preview functions Tree view of folders and files Descriptive metadata 15.12.2014 28

Example Use Cases Research groups Data belonging to a manuscript are collected, submitted to the long term archive and made accessible via DOI for reviewers and readers Research group has a structured filing without metadata; it should be edited and submitted into the long term archive PhD students of a group are presented with a filing structure they should follow when managing their data Administrative staff within ETH Delivers structured data to ETH Zurichs university archives archives staff appraises and selects content and adds metadata 15.12.2014 29

What docuteam packer is not! No comprehensive data management solution No records management solution No collaboration platform No data repository No long term archive - but a tool to prepare for and submit to archive No solution for local rights management Not tied to use with Rosetta as the only long term archive Ø Consider alternative approaches where these are more appropriate Ø Be careful with using the tool without submitting to an archive 15.12.2014 30

Role within the overall concept There will be no obligation to use docuteam packer Other routes exist for ingesting content into ETH Data Archive Web-Upload to Rosetta Automatic batch processes based on existing metadata (e.g. exported from an application as METS-XML or CSV plus respective objects. Decide on best route to use after discussion with customer First customers going live now, including University Archives of ETH 15.12.2014 31

Perspective on Swiss level Programme «Scientific Information» of the Swiss University Conference (SUC) Proposal Data Lifecycle Management (new proposal under preparation for February 2015) Partners: EPFL, ETHZ (SIS), HEG, Unis Basel, Genève, Lausanne, Zürich, SWITCH 15.12.2014 32

Questions? www.library.ethz.ch/digital-curation data-archive@library.ethz.ch Dr. Head Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 60 32 matthias.toewe@library.ethz.ch 15.12.2014 33