Scientific Data Curation and the Grid

Similar documents
Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

I data set della ricerca ed il progetto EUDAT

Survey of Research Data Management Practices at the University of Pretoria

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Survey of research data management practices at the University of Pretoria, South Africa: October 2009 March 2010

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Data Management Dr Evelyn Flanagan

e-infrastructures in FP7 INFO DAY - Paris

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Distributed Data Management with Storage Resource Broker in the UK

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Metadata Workshop 3 March 2006 Part 1

An overview of the OAIS and Representation Information

DCC Disciplinary Metadata

EUDAT. Towards a pan-european Collaborative Data Infrastructure

TM Interfaces. Interface Management Workshop June 2013

e-infrastructure: objectives and strategy in FP7

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

A Digital Preservation Roadmap for Public Media Institutions

Metadata Models for Experimental Science Data Management

Metadata Overview: digital repositories

The Virtual Observatory and the IVOA

Scientific Research Data Management Policy

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

The role of e-infrastructure for the preservation of cultural heritage. Federico Ruggieri Head of Distributed Computing and Storage Deparment of GARR

The OAIS Reference Model: current implementations

Conference of Directors of National Libraries in Asia and Oceania. Hanoi, 20 April 2009

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

The International Journal of Digital Curation Issue 1, Volume

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Description Cross-domain Task Force Research Design Statement

Towards FAIRness: some reflections from an Earth Science perspective

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

Robin Dale RLG

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Data Partnerships to Improve Health Frequently Asked Questions. Glossary...9

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Dagmar Triebel, Peter Grobe, Anton Güntsch, Gregor Hagedorn, Joachim Holstein, Carola Söhngen, Claus Weiland, Tanja Weibulat.

Simplifying Collaboration in the Cloud

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

The CEDA Archive: Data, Services and Infrastructure

Basic Requirements for Research Infrastructures in Europe

The UK Marine Environmental Data and Information Network MEDIN

CU Boulder Research Cyberinfrastructure plan 1

Agenda. Bibliography

Managing very large Multimedia Archives and their Integration into Federations

The National Fusion Collaboratory

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

PID System for eresearch

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Fundamentals of Data Infrastructures

Navigational Data Management. Joshua Stillerman, Martin Greenwald, John Wright MIT Plasma Science and Fusion Center

A Simplified Access to Grid Resources for Virtual Research Communities

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

Context-based Roles and Competencies of Data Curators in Supporting Data Lifecycle: Multi-Case Study in China

Designing the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre

How can CLARIN archive and curate my resources?

Getting Started with GeoQuery

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

The National Digital Library Finna Among Digital Research Infrastructures in Finland

A VO-friendly, Community-based Authorization Framework

The Data Management Plan: Putting policy into practice Suzanne Clarke Director, Information Resources

Data governance and data quality: is it on your agenda or lurking in the shadows?

Data Reuse and Transparency in the Data Lifecycle. Steven Worley Doug Schuster Bob Dattore National Center for Atmospheric Research Boulder, CO USA

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty

National Documentation Centre Open access in Cultural Heritage digital content

Developing the ICSU World Data System (WDS)

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

A structured workflow for implementing digital archiving standards in an organisation

The CASPAR Finding Aids

Terminologies, Knowledge Organization Systems, Ontologies

The Scottish Collections Network: landscaping the Scottish common information environment. Gordon Dunsire

XML information Packaging Standards for Archives

Digital repositories as research infrastructure: a UK perspective

UNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Data to Decisions Advanced Technology FY 2012 OCO

Requirements for data catalogues within facilities

Description Cross Domain - Metadata Schema Registry Presentation to ISO Working Group Sydney, 2 November 2004

Presented by Dr Joanne Evans, Centre for Organisational and Social informatics Faculty of IT, Monash University Designing for interoperability

Promoting Open Standards for Digital Repository. case study examples and challenges

A Simple Mass Storage System for the SRB Data Grid

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

The IEEE Metadata Standard for Supporting Big Data Management

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

<Insert Picture Here> Oracle s Medical Imaging Technology Oracle Multimedia DICOM: Next Generation Platform

DCH-RP Trust-Building Report

einfrastructures Concertation Event

Data Curation Handbook Steps

Transcription:

Scientific Data Curation and the Grid David Boyd CLRC e-science Centre http://www.e-science.clrc.ac.uk/ d.r.s.boyd@rl.ac.uk 19 October 2001 Digital Curation Seminar 1

Outline Some perspectives on scientific data Requirements of scientific data curation How the Grid can help Some current work at CLRC 19 October 2001 Digital Curation Seminar 2

The problem is growing... we have a rapidly increasing capability to generate data in many different formats in the physical and life sciences environmental science, astronomy, particle physics, genomics, proteomics, clinical trials,... in all these areas data volumes are tending towards petabytes the facilities used to generate data are increasingly expensive satellites, synchrotrons, supercomputers, telescopes,... some data is irreplaceable measurements, surveys,... much future potential will lie in analyses which combine data across traditional disciplinary boundaries access to data will increasingly be needed on a global scale 19 October 2001 Digital Curation Seminar 3

Where is the value in data? in the information it conveys, not just in the bits in context and relationships as well as content metadata provides a key to this information - enhances value this needs to be recorded and preserved with the data if not captured digitally in (near) real time then it may never be recovered - need skills to do this need ontologies and thesauri to define and relate concepts well documented data formats help to maintain value use of globally accepted standards aids accessibility value also in models and theories needed to interpret data these are encoded in software how to preserve software as well as data? 19 October 2001 Digital Curation Seminar 4

OAIS Reference Model... it is harder to preserve working software than to preserve information in digital or hardcopy forms 19 October 2001 Digital Curation Seminar 5

Whose data is it anyway? data ownership often (usually?) unclear with ownership goes responsibility so responsibility for data curation usually unclear so it often doesn t get done one problem is the initial and ongoing cost of curating data though often less than the cost of (re)generating data but may not be funded as part of a grant another problem is lack of a culture of data curation benefit not necessarily to the original producer needs a fundamental change of attitude by sponsors recognition that data is a significant long term asset willingness to fund long term data curation activities 19 October 2001 Digital Curation Seminar 6

Requirements of scientific data curation traditionally keeping a collection management security - keeping data safe from threats integrity - ensuring authenticity and completeness of the data preservation - over time and technological change acceptance of responsibility for data accessibility exercising access control where necessary providing knowledge of existence enabling exploration of related information (metadata) enabling retrieval of content (data) preserving ability to read (physically) and understand (logically) 19 October 2001 Digital Curation Seminar 7

How the Grid can help offers a mechanism for controlling access authentication and authorisation makes existence and location of data resources more visible hierarchical directory service, global visibility provides easier access to data single sign-on, location transparency, controlled replication integrates processes and data to generate enhanced value data recalibration, filtering, transforming, post-processing... moves data to application or application to data facilitates distributed collaborative working by sharing data... 19 October 2001 Digital Curation Seminar 8

Some current work at CLRC Developing a Web/Grid portal for accessing scientific data common top level metadata schema for scientific data links to existing application specific metadata ability to search distributed metadata catalogues transparent access to distributed data resources GUI access for people and API access for software/agents Operating a large data archive (~PB scale) Grid-accessible primary data storage resource provides data security, integrity, preservation offers data archiving service to other primary data resources Participating in global standards activities Global Grid Forum, WWW Consortium, ISO, etc,... 19 October 2001 Digital Curation Seminar 9