Research Data Management & Preservation: A Library Perspective

Similar documents
Click to edit Master title style

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. Towards a pan-european Collaborative Data Infrastructure

COPPUL Digital Preservation Network: Update for ACCOLEDS. Corey Davis COPPUL DPN Manager

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

UC Irvine LAUC-I and Library Staff Research

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

CANARIE: Providing Essential Digital Infrastructure for Canada

Portage Progress Report

IJDC Practice Paper. Towards a Collaborative National Research Data Management Network. Martha Whitehead Queen s University.

TDWI Data Governance Fundamentals: Managing Data as an Asset

RDM, a view from Vancouver

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Striving for efficiency

Data publication and discovery with Globus

2017 Resource Allocations Competition Results

John Edgar 2

Response to Industry Canada Consultation Developing a Digital Research Infrastructure Strategy

Developing a Research Data Policy

The Canadian Information Network for Research in the Social Sciences and Humanities.

NDSA Web Archiving Survey

Data Curation Profile Human Genomics

NEW YORK PUBLIC LIBRARY

Introduction to the Mathematics of Big Data. Philippe B. Laval

Allan Bell, University of British Columbia Alex Garnett, Simon Fraser University Carla Graebner, Simon Fraser University Geoff Harder, University of

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

The Mathematics of Big Data

More than a Lifetime of

Designing the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

EGI federated e-infrastructure, a building block for the Open Science Commons

Writing a Data Management Plan A guide for the perplexed

Indiana University Research Technology and the Research Data Alliance

Digital Preservation at NARA

EUDAT & SeaDataCloud

Data management Backgrounds and steps to implementation; A pragmatic approach.

Data Management Tools. Lizzy Rolando, Georgia Tech Aaron Trehub, Auburn University August 6, 2013

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Day 3. Storage Devices + Types of Memory + Measuring Memory + Computer Performance

I data set della ricerca ed il progetto EUDAT

Click to edit Master subtitle style

Data Curation Profile Plant Genomics

Brown University Libraries Technology Plan

Advanced Placement Computer Science Principles The Information Age

A Digital Preservation Roadmap for Public Media Institutions

Digital Research Canada: The National Advanced Research Computing and Data Management Service

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

A VOSpace deployment: interoperability and integration in big infrastructure

The Data Management Plan: Putting policy into practice Suzanne Clarke Director, Information Resources

High Performance Computing Course Notes Grid Computing I

Open Data in Switzerland Impact on Society, Economy, Science and Culture

Survey of Research Data Management Practices at the University of Pretoria

PRESERVING DIGITAL OBJECTS

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

BIG DATA BIG IMPACT: IMPROVING THE REACH OF YOUR RESEARCH DATA

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

EUDAT- Towards a Global Collaborative Data Infrastructure

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

The Role of the Universities in Building Canada s Digital Infrastructure for Research

Canadian Common CV. April 26, 2013 Kelley Bromley-Brits, PhD Grants Facilitation Officer Faculty of Arts, MUN

Protecting Future Access Now Models for Preserving Locally Created Content

Canadian Common CV and Research Portal Workshop (NSERC and SSHRC focus)

Exaflood Optics 1018

Canadian Common CV and Research Portal Workshop For Students

Developing a social science data platform. Ron Dekker Director CESSDA

Click to edit Master title style

EUDAT and Cloud Services

Managing stakeholders and (creating) their expectations

Tools for Handling Big Data and Compu5ng Demands. Humani5es and Social Science Scholars

Data Management Plan Generic Template Zach S. Henderson Library

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Ada L. Benavides, Deputy Chief South Pacific Division Regional Integration Team. May 5, US Army Corps of Engineers BUILDING STRONG

Intro to Computers in Arabic!!!!! MAI ELSHEHALY

Supercomputing in Canada: An Introduction to WestGrid & Compute Canada. Patrick Mann, Director of Operations, WestGrid Monday, September 17, 2018

Data Curation Profile Movement of Proteins

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Importance of the Data Management process in setting up the GDPR within a company CREOBIS

WorldCat Digital Collection Gateway Visibility for Digital Collections

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Demos: DMP Assistant and Dataverse

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

To Compete You Must Compute

DOWNLOAD OR READ : METADATA FOR DIGITAL RESOURCES PDF EBOOK EPUB MOBI

Make your own data management plan

GOVERNMENT IT: FOCUSING ON 5 TECHNOLOGY PRIORITIES

CEF e-invoicing. Presentation to the European Multi- Stakeholder Forum on e-invoicing. DIGIT Directorate-General for Informatics.

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

Active Archive and the State of the Industry

August 14th - 18th 2005, Oslo, Norway. Conference Programme:

Data Curation Profile Food Technology and Processing / Food Preservation

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Metadata Framework for Resource Discovery

Data Discovery - Introduction

The Computation and Data Needs of Canadian Astronomy

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Transcription:

Research Data Management & Preservation: A Library Perspective Brian Owen, Associate University Librarian Library Technology Services & Special Collections, Simon Fraser University Library

LIBRARIES & BIG DATA A Historical Perspec/ve

Big Data Infrastructure circa 1925

Big Data Access Devices circa 1925

Big Data circa 1975

Big Data in Libraries circa 1975

Big Data circa 2016 1000 Bytes = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte

Big Data circa 2020? Trends project a demand for 1-3 million Haswellequivalent cores by 2020, and more than an exabyte of persistent storage. These projections may turn out to be underestimates, since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020. (Compute Canada Technology Briefing, November 12, 2015)

Library of Congress 2016

Library of Congress Holdings 32 million cataloged books and other print materials in 470 languages 61 million manuscripts over 1 million U.S. government publications 1 million issues of world newspapers 5.3 million maps 6 million works of sheet music 3 million sound recordings 4.7 million prints and photographic images including fine and popular art pieces and architectural drawings

Library of Congress = Unit of Measurement Every Six Hours, the NSA Gathers as Much Data as Is Stored in the Entire Library of Congress There are 25 Petabytes (10^15) created every day and thrown into the internet. This is 70 times larger than the Library of Congress. Facebook s photo collection has a staggering 140 billion photos, that s over 10,000 times larger than the Library of Congress. Walmart handles 1 million plus customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data - the equivalent of 167 times the information contained in all the books in the Library of Congress

RESEARCH DATA MANAGEMENT & PRESERVATION A Library Perspec/ve

Stakeholders Compute Canada, CANARIE à Research Data Canada CFI, CIHR, NSERC, SSHRC CANFAR, CBRAIN U15, CUCCIO, CASRAI Institutional: VP Research, Research Ethics, IT Services CARL- Canadian Association of Research Libraries And don t forget researchers

RDM Spectrum Canadian Genome Centres Currently 24 petabytes Estimate 219 petabytes by 2020 Canadian Astronomy Data Centre 1.2 petabytes currently on Westgrid Estimate 30-100 petabytes eventually needed Ocean Networks Canada (Neptune, Venus) 300+ terabytes of data archived 170 gigabytes of data collected every day 230 gigabytes of data are distributed every day Pacific Herring Project 26 narrative videos totalling about 5 megabytes

Data Curation Digital curation is maintaining and adding value to a trusted body of digital research data for current and future use; it encompasses the active management of data throughout the research lifecycle.

Research Data Lifecycle

Research Data Management Issues data integrity preservation discovery, access and authentication re-use policies and procedures (local, national, funding agencies; also intellectual property) inter-operability and participation in larger national and international initiatives standards and metadata

Canadian Association Of Research Libraries (CARL) Developing a national research data culture Fostering a community of practice for research data Building national research data services and infrastructure

Data Management Plans DMP Assistant is a bilingual tool for preparing data management plans (DMPs). The tool follows best practices in data stewardship and walks researchers step-by-step through key questions about data management. https://assistant.portagenetwork.ca/

Discovery and Preservation Library-based projects (2012-15) SFU: Research data repository & preservation UBC: Research data repository & preservation Univ. of Alberta: Canadian Polar Data Network OCUL & Scholars Portal: Repository & Cloud Storage Outcomes Prototypes and limited production systems Experience

Discovery and Preservation Research Data Canada Federated Pilot (2014-15) Participants: Compute Canada, CANARIE, CARL Software Platforms: Dataverse/Islandora (Discovery) Archivematica (Preservation) Globus (Replication & Discussion) Libraries: repository and preservation tools, metadata standards, researcher needs Compute Canada: replication tools, robustness and scalability assessment Outcomes More experience

CARL & COMPUTE CANADA MOU (2016) CARL/Portage: research data management expertise, metadata and workflow design / implementation contributing to requirement specifications and design obtaining research data for test purposes liaising with researchers, repository services, and preservation software developers Compute Canada: software development and computational resources, Globus liaison for liaising with Globus costs of conducting these activities

Conclusions state of churn Storage and infrastructure requirements RDM platforms, tools and workflows Policies and procedures Roles and responsibilities Costs to sustain

Thank You Questions? Brian Owen brian_owen@sfu.ca