Data Discovery - Introduction

Similar documents
Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Data management and discovery

EUDAT- Towards a Global Collaborative Data Infrastructure

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT. Towards a pan-european Collaborative Data Infrastructure

The EUDAT Collaborative Data Infrastructure

EUDAT and Cloud Services

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT & SeaDataCloud

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

irods workflows for the data management in the EUDAT pan-european infrastructure

EUDAT - Open Data Services for Research

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Using EUDAT services to replicate, store, share, and find cultural heritage data

I data set della ricerca ed il progetto EUDAT

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Fundamentals of Data Infrastructures

EUDAT Common data infrastructure

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

EGI federated e-infrastructure, a building block for the Open Science Commons

EUDAT Towards a Collaborative Data Infrastructure

Striving for efficiency

Towards FAIRness: some reflections from an Earth Science perspective

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Towards FAIRness in research data. Per Öster, 3 October 2018

Towards a joint service catalogue for e-infrastructure services

Graph-based Data Integration in EUDAT Data Infrastructure

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Webinar Annotate data in the EUDAT CDI

How to share research data

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Indiana University Research Technology and the Research Data Alliance

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Big Data infrastructure and tools in libraries

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

B2SAFE metadata management

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Checklist and guidance for a Data Management Plan, v1.0

Science Europe Consultation on Research Data Management

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Research Elsevier

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

The iplant Data Commons

Progress towards the EOSC

Research Data Management

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

GEOSS Data Management Principles: Importance and Implementation

Sharing research data globally and supporting re-use - Playing your part Leif Laaksonen /National Archive of Finland

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

Reproducibility and FAIR Data in the Earth and Space Sciences

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

B2FIND and Metadata Quality

ASPECT Adopting Standards and Specifications for. Final Project Presentation By David Massart, EUN April 2011

Key Elements of Global Data Infrastructures

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Sharing Best Security Practices with your Peers - on an International Level

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Security in Big and Open Data - WG. Alessandra Scicchitano GÉANT - Project Development Officer Ralph Niederberger Jülich Supercomputing Center (FZJ)

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Data Citation and Scholarship

DATA SHARING FOR BETTER SCIENCE

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Dataverse and DataTags

Conferenza GARR Moving Forward to deliver an «EOSC in Practice» by 2018

Scientific Research Data Management Policy

Helix Nebula The Science Cloud

Poland - e-infrastructure ecosystem and relation to EOSC

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

A national approach for storage scale-out scenarios based on irods

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Open Science, FAIR data and effective data management

TEXT MINING: THE NEXT DATA FRONTIER

Open Smart City Open Social Innovation Infrastructure. Cezary Mazurek, Maciej Stroinski Poznan Supercomputing and Networking Center

Educating a New Breed of Data Scientists for Scientific Data Management

How to make your data open

ASTRONOMY & PARTICLE PHYSICS CLUSTER

Data Curation Handbook Steps

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

ESFRI WORKSHOP ON RIs AND EOSC

Data publication and discovery with Globus

Transcription:

Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter

In days gone by: Design an experiment Getting Your Data Conduct the experiment In the lab Real-world observation In silica (Computational Science) Obtain (your own) data Now, more and more, there is a large amount of data that has already been collected, and stored

Source: I2S2 Idealised Scien-fic Research Ac-vity Lifecycle Model h*p://www.ukoln.ac.uk/projects/i2s2/documents/i2s2- ResearchAc=vityLifecycleModel- 110407.pdf Copyright I2S2 Partner

A (simplified) Data Lifecycle Data Created / Obtained Data Stored Data Used Data Stored Data Made Available for ReUse Data Stored

A (simplified) Data Lifecycle Data Created / Obtained Data Stored Data Used Data Stored Data Made Available for Re- Use Data Stored

Data Discovery & Sharing: Two Sides of the Same Coin Why re-use data? Avoids duplication of effort Easier/cheaper than collecting your own It may not be possible to remeasure (e.g. climate data) Validate/test previous results Why make your data re-usable? Allow others to build on your efforts and use your data in new ways Allow others to validate/test your results Credit? Reputation? Obligation to funders?

Data Discovery & Sharing: Two Sides of the Same Coin How to discover data Web Search Metadata Search Follow links from other data and publications Search popular repositories Ask your twitter followers How to make your data discoverable Give it a Persistent Identifier Link it to other data, and cite it Associate it with Metadata Put it somewhere where people can get it easily (e.g. online) Put it in a trusted repository which will look after it beyond when you d look after it

Allow me to introduce EUDAT A partnership of leading European Data Centres and Research Communities working towards a Collaborative Data Infrastructure

EUDAT: Vision & Architecture EUDAT began with the concept of the Collaborative Data Infrastructure See Riding the Wave (High Level Expert Group on Scientific Data, Final Report, 2010) This identified a handful of core Service Cases And the implementation of the Service Cases led to our current distributed Architecture See later

What is the EUDAT CDI? The EUDAT Collaborative Data Infrastructure is a pan-european, cross-disciplinary domain of research data for both big community researchers and long tail scientists where data are registered, preserved, accessible and made re-usable

What does this mean? Pan-European Fundamentally, a wide-area distributed architecture Cross-disciplinary Five core stakeholder communities, many other interested; many sources of conflicting requirements! Including simplified services to encourage the long tail to participate All implies a significant systems integration challenge!

What does this mean? (2) Registered means EUDAT data are Globally identified and discoverable (the PID Service) Preserved means EUDAT data are Stored at big European HPC and data centres Replicated for safety (B2SAFE: the Safe Replication Service) Governed by policy rules (the Policy Management Service)

What does this mean? (3) Accessible means EUDAT data are Identifiable and findable (the PID Service) Retrievable efficiently (B2STAGE: the Data Staging Service) Governed by suitable access control (the AAI Service) Re-usable means EUDAT data are Findable (the PID Service) Comprehensible (B2FIND: the Joint Metadata Service) Composable and combinable (future workflow and computational services)

What does this mean? (4) For both big communities and long tail means Stable, web-service APIs for existing tool-stacks to use (the Common Service Layer Interface) Low barriers to use (the Simple Store Service) Hence the core EUDAT service cases Identifying solutions for these cases that work with our stakeholder communities existing solutions led us to the current CDI architecture

The CDI network architecture The CDI is a connected network of European research institutions and data centres (collectively Nodes) each offering one or more common EUDAT data services to both participating research communities and independent researchers Data centre Nodes have lots of connections Research community Nodes need only one Connections have both technical & policy agreement aspects

The CDI network architecture Generic data centres Community data sites

CDI Node architecture Nodes run parts of the CDI Node software suite, depending on which services they want to offer All Nodes should offer Safe Replication and PID This is really what being in the CDI is all about Others are optional Depends on what a Node s expected user base requires (Some data centre Nodes also need to run the Operational Services suite)

Community s/w Using the CDI Joining vs Using the CDI CSLI CSLI CSLI JMD (UI) SSS (UI) PID JMD SSS (store) EUDAT core EUDAT core Community s/w EUDAT core Joining the CDI EUDAT CDI

What s Next for EUDAT? Working Groups Data Access & Re-Use Policies Dynamic Data Semantics Workflows New Services? Under Consideration B2NOTE? Annotation B2DROP? File Workspace & Synchronisation B2HOST? Hosting of (Application Specific) Data Services