Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Similar documents
EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Data Discovery - Introduction

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

irods workflows for the data management in the EUDAT pan-european infrastructure

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT Towards a Collaborative Data Infrastructure

Data Staging: Moving large amounts of data around, and moving it close to compute resources

The EUDAT Collaborative Data Infrastructure

EUDAT Common data infrastructure

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

I data set della ricerca ed il progetto EUDAT

EUDAT - Open Data Services for Research

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Fundamentals of Data Infrastructures

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

B2SAFE metadata management

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EGI federated e-infrastructure, a building block for the Open Science Commons

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT and Cloud Services

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Towards a joint service catalogue for e-infrastructure services

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Designing an institutional research data management infrastructure for the life sciences

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

EuroHPC and the European HPC Strategy HPC User Forum September 4-6, 2018 Dearborn, Michigan, USA

Science-as-a-Service

Poland - e-infrastructure ecosystem and relation to EOSC

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Indiana University Research Technology and the Research Data Alliance

Open Smart City Open Social Innovation Infrastructure. Cezary Mazurek, Maciej Stroinski Poznan Supercomputing and Networking Center

European Open Science Cloud

Webinar Annotate data in the EUDAT CDI

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

WP3: Policy and Best Practice Harmonisation

A national approach for storage scale-out scenarios based on irods

The DART-Europe E-theses Portal

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Europeana Core Service Platform

Sharing Best Security Practices with your Peers - on an International Level

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

South African Science Gateways

The EuroHPC strategic initiative

Key Elements of Global Data Infrastructures

Containerised Development of a Scientific Data Management System Ben Leighton, Andrew Freebairn, Ashley Sommer, Jonathan Yu, Simon Cox LAND AND WATER

INSPIRE tools What's new?

IS4H TOOLKIT TOOL: Workshop on Developing a National ehealth Strategy (Workshop Template)

«Prompting an EOSC in Practice»

irods 4.0 and Beyond Presented at the irods & DDN User Group Meeting 2014

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

Striving for efficiency

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Kubernetes made easy with Docker EE. Patrick van der Bleek Sr. Solutions Engineer NEMEA

EuroHPC Bologna 23 Marzo Gabriella Scipione

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

EUDAT & SeaDataCloud

INDIGO AAI An overview and status update!

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

EGI-InSPIRE. Cloud Services. Steven Newhouse, EGI.eu Director. 23/05/2011 Cloud Services - ASPIRE - May EGI-InSPIRE RI

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

ESFRI WORKSHOP ON RIs AND EOSC

The iplant Data Commons

Safe Replication and Data Staging

ESFRI Strategic Roadmap & RI Long-term sustainability an EC overview

The Materials Data Facility

AAI in EGI Current status

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Open Science Commons: A Participatory Model for the Open Science Cloud

GATE: Big Data for Smart Society Dessislava Petrova-Antonova Sofia University St. Kliment Ohridski Faculty of Mathematics and Informatics

ASPECT Adopting Standards and Specifications for. Final Project Presentation By David Massart, EUN April 2011

Federated access to e-infrastructures worldwide

European Network and ICOS

PID System for eresearch

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Kepler and Grid Systems -- Early Efforts --

EuroHPC: the European HPC Strategy

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

First European Globus Community Forum Meeting

How to share research data

The Need for a Terminology Bridge. May 2009

GN4-1 SA8 Real Time Applications and Multimedia Management. Networks Services People 1

Helix Nebula The Science Cloud

2. HDF AAI Meeting -- Demo Slides

Transcription:

Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio Cacciari 1530 Coffee 1545 The EUDAT Data Staging Service Claudio Cacciari 1700 Close 1

Course Timetable TIMETABLES FOR DAY 2 & DAY 3 SUBJECT TO FINASATION Data Staging and Data Movement Data Staging is particularly important to PRACE users and future PRACE users, but in general it is one aspect of Data Movement in general What we mean by Data Staging Term is used to mean different things, even within our project. We ll use it generically to mean the movement of data between systems, but obviously in a PRACE context, one of these systems is usually an HPC resource 2

Speed Reliability Availability Integrity Security Ease Of Use Simplicity Familiarity Flexibility Data Movement: What End Users Care About Why isn t this easy? End points (where you re moving the data to and from are typically in different jurisdictions Security vs Performance tradeoffs Changing ecosystem: users want to be able to work in different ways. Consider, e.g., Dropbox Relationship between data location and data meaning Data is getting bigger faster than the pipes 3

Teach by example A combination of: Approach of this course A description of how we re doing things in EUDAT A demonstration of some of the tools and software that we re using to implement EUDAT and that can be used to interface with EUDAT and bridge the gap between EUDAT, PRACE and other infrastructures, services and software that you use or develop. EUDAT: Vision & Architecture EUDAT began with the concept of the Collaborative Data Infrastructure See Riding the Wave (High Level Expert Group on Scientific Data, Final Report, 2010 This identified a handful of core Service Cases See http://www.eudat.eu/services-and-technologies And the implementation of the Service Cases led to our current distributed Architecture 4

What is the EUDAT CDI? The EUDAT Collaborative Data Infrastructure is a pan-european, cross-disciplinary domain of research data for both big community researchers and long tail scientists where data are registered, preserved, accessible and made re-usable Pan-European What does this mean? Fundamentally, a wide-area distributed architecture Cross-disciplinary Five core stakeholder communities, many other interested; many sources of conflicting requirements! Including simplified services to encourage the long tail to participate All implies a significant systems integration challenge! 5

What does this mean? (2 Registered means EUDAT data are Globally identified and discoverable (the PID Service Preserved means EUDAT data are Stored at big European HPC and data centres Replicated for safety (the Safe Replication Service Governed by policy rules (the Policy Management Service What does this mean? (3 Accessible means EUDAT data are Identifiable and findable (the PID Service Retrievable efficiently (the Data Staging Service Governed by suitable access control (the AAI Service Re-usable means EUDAT data are Findable (the PID Service Comprehensible (the Joint Metadata Service Composable and combinable (future workflow and computational services 6

What does this mean? (4 For both big communities and long tail means Stable, web-service APIs for existing tool-stacks to use (the Common Service Layer Interface Low barriers to use (the Simple Store Service Hence the core EUDAT service cases Identifying solutions for these cases that work with our stakeholder communities existing solutions led us to the current CDI architecture The CDI network architecture The CDI is a connected network of European research institutions and data centres (collectively Nodes each offering one or more common EUDAT data services to both participating research communities and independent researchers Data centre Nodes have lots of connections Research community Nodes need only one Connections have both technical & policy agreement aspects 7

The CDI network architecture Generic data centres Community data sites CDI Node architecture Nodes run parts of the CDI Node software suite, depending on which services they want to offer All Nodes should offer Safe Replication and PID This is really what being in the CDI is all about Others are optional Depends on what a Node s expected user base requires (Some data centre Nodes also need to run the Operational Services suite 8

CDI service development timeline Year EUDAT Services CDI 2012 Safe Replication v1, Data Staging v1, PID v1, Operational Services v1; AAI (design V 1 2013 AAI v1, Joint Metadata v1, Simple Store Service v1, Safe Replication v2, Data Staging v2, Operational Services v2; Common Service Layer Interface (design, workflow and computation services (design 2014 AAI v2, Joint Metadata v2, Simple Store Service v2, Data Staging v3, Operational Services v3, v1, computational and workflow services v1 V 2 V 3 2015 + PID v2, Safe Replication v3, computational and workflow services v2 V 4 Data Staging (DS v1 PID v1 Handle PID registration (PID v1 SR v1, OP v1 SR v1, OP v1 Community s/w SR v1 Operational services (OP v1 Safe Replication (SR v1 CDI V1 (2012 9

+ Web (DS v2 AAI v1 JMD v1 (UI Simple Store ( v1 v1 (UI PID v1 JMD v1 v1 SR v2, OP v2 SR v2, OP v2 Community s/w SR v2 Joint Metadata (JMD v1 + Policy Management (SR v2 CDI V2 (2013 Data Staging (DS v3 + Federation AAI v2 Workflow, Compute v1 JMD v2 (UI Simple Store ( v2 v2 (UI PID v1 JMD v2 v2 SR v2, OPv3 SR v2, OPv3 Community s/w SR v2 Joint Metadata (JMD v2 CDI V3 (2014 10

Workflow Coordinatio n (WC v1 WC + Generalisation (computational v2 JMD v2 (UI v2 (UI PID v2 Enhanced Handle registration (PID v2 JMD v2 v2 SR v3 SR v3, OPv4 SR v3, OPv4 Community s/w + Real Time Data (SR v3 CDI V4 (2015+ Current Node software suite Safe Replication irods v3.x + EUDAT replication microservices + GSI (X.509 security PID EPIC/Handle system (external service + EUDAT EPIC client microservices for irods Data Staging Griffin gridftp irods client + EUDAT Data Staging client Joint Metadata CKAN + Apache Solr + OAI-PMH Simple Store Invenio + EUDAT faceted user interface layer 11

Communit y s/w Using the CDI Joining vs Using the CDI JMD (UI (UI PID JMD EUDAT core EUDAT core Community s/w EUDAT core Joining the CDI EUDAT CDI 12