EUDAT - Open Data Services for Research

Similar documents
EUDAT- Towards a Global Collaborative Data Infrastructure

The EUDAT Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data management and discovery

EUDAT and Cloud Services

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT. Towards a pan-european Collaborative Data Infrastructure

irods workflows for the data management in the EUDAT pan-european infrastructure

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT & SeaDataCloud

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EGI Check-in service. Secure and user-friendly federated authentication and authorisation

I data set della ricerca ed il progetto EUDAT

EUDAT Towards a Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Discovery - Introduction

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Safe Replication and Data Staging

Using EUDAT services to replicate, store, share, and find cultural heritage data

irods Security Aspects Willem Elbers CLARIN-ERIC, Netherlands

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Deliverable DJRA1.1. Use-Cases for Interoperable Cross- Infrastructure AAI

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

The EGI AAI CheckIn Service

Sharing research data globally and supporting re-use - Playing your part Leif Laaksonen /National Archive of Finland

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EGI federated e-infrastructure, a building block for the Open Science Commons

EUDAT Common data infrastructure

2. HDF AAI Meeting -- Demo Slides

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

e-infrastructure: objectives and strategy in FP7

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

e-infrastructures in FP7 INFO DAY - Paris

Key Elements of Global Data Infrastructures

OpenAIRE Open Knowledge Infrastructure for Europe

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

AARC Blueprint Architecture

GÉANT Community Programme

WP JRA1: Architectures for an integrated and interoperable AAI

AAI in EGI Current status

The challenges of (non-)openness:

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

WP3: Policy and Best Practice Harmonisation

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

dcache integration into HDF

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

B2DROP The EUDAT Personal Cloud Storage

Open Science Commons: A Participatory Model for the Open Science Cloud

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

B2SAFE metadata management

BPMN Processes for machine-actionable DMPs

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Supporting European e-infrastructure service providers to join the European Open Science Cloud

Options for Joining edugain. Lukas Hämmerle, SWITCH DARIAH Workshop, Köln 18 October 2013

AARC. Christos Kanellopoulos AARC Architecture WP Leader GRNET. Authentication and Authorisation for Research and Collaboration

Fundamentals of Data Infrastructures

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Deliverable D3.5 Harmonised e-authentication architecture in collaboration with STORK platform (M40) ATTPS. Achieving The Trust Paradigm Shift

RCauth.eu / MasterPortal update

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

Towards a joint service catalogue for e-infrastructure services

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

European digital repository certification: the way forward

Research Infrastructures for All You could be Next! e-infrastructures - WP

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

On Trust! From an MPG and EUDAT Perspective! Raphael Ritz, RZG. Stockholm, June 4, 2014!

Authentication & Authorization systems developed for CTA

1. General requirements

SAP Security in a Hybrid World. Kiran Kola

DARIAH-AAI. DASISH AAI Meeting. Nijmegen, March 9th,

INDIGO AAI An overview and status update!

Digital Single Market Technologies and Public Service Modernisation Package -DSM. Grazyna Wojcieszko DG CONNECT

Pilots to support guest users solutions

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

GÉANT Services Supporting International Networking and Collaboration

In Section 4 we discuss the proposed architecture and analyse how it can match the identified 2

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Prof. Christos Xenakis

AARC Overview. Licia Florio, David Groep. 21 Jan presented by David Groep, Nikhef.

Prof. Christos Xenakis

Science Europe Consultation on Research Data Management

Striving for efficiency

The Materials Data Facility

Progress towards the EOSC

epic and the Handle System

Deliverable DSA1.4: Pilots to improve access to R&E-relevant resources

Challenges in Authenticationand Identity Management

Technical Overview. Version March 2018 Author: Vittorio Bertola

European Open Science Cloud

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Transcription:

EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-infrastructures. Contract No. 654065 www.eudat.eu

Research Infrastructures Where are we going? Research Infrastructure trends & challenges Internationalisation Diversification Increasingly relying on ICT Data deluge Complexity Trust, Authenticity Citation, Credits Open Access Open Data European RIs: Around 500 100 billion investment middle age 19th century 20th century 21st century 2

An e-infrastructure solution for pan-european Research Data Challenges All Research communities and RIs are facing similar data challenges Where to store (big) data? How to keep-it meaningful over time? How to share data? How to publish it? How to register data objects? How to connect them? How to find it? How to access it? How to transfer it? Solutions needed at global level collaboration needed Exploitation of synergies Some services are common to many communities / research domains Reduce investment and operational cost Collaborate on optimizing standards for APIs, MD, DO Identity profiles, policies 3

EUDAT consortium (2011, 2015) EUDAT offers common data services, supporting multiple research communities as well as individuals, through a geographically distributed, resilient network of 35 European organisations e-science Data Factory 4

Data Curation Trust HLEG 2010: Collaborative Data Infrastructure Data Generators Users User functionalities, data capture & transfer, virtual research environments Community Services Data discovery & navigation, workflow generation, annotation, interpretability Common Data Services Persistent storage, identification, authenticity, workflow execution, mining 5

Community-Driven BIOMEDICAL & MEDICAL SCIENCES MATERIALS & ANALYTICAL FACILITIES MAPPER PHYSICAL SCIENCES & ENGINEERING 6

EUDAT collaborating with other e-infrastructures Policy & guidelines Data management plans Service integration Open AIRE RDA Policy & networking Output adoption Test beds GEANT LERU LIBER PRACE Cross-infra services & ops Common protocols, APIs HPC/HTC/Clouds EGI Helix Nebula Data Cloud 7

B2 Services 8

B2Services and the Research Data Pyramid 9

Participate by Using the CDI (1) Community Thematical (Community) Center Using the CDI via standardised APIs Community policies independent from CDI Community centre either remains main actor for community data stewardship EUDAT CDI 10

Participate by Using the CDI (2) Community Thematical (Community) Center Using the CDI via standardised APIs policies independent from CDI Some responsibility for data stewardship delegated to a CDI center (ingest node) EUDAT CDI 11

Participate by Joining the CDI Community Community Center Community centre installs EUDAT (B2SAFE) middleware Common CDI policies concerning PID configuration, MD handling, security and other operational procedures apply EUDAT CDI 12

Metadata support within the CDI Data and metadata via a HTTP/JSON descriptions Data and metadata as separate objects D MD D Data and metadata in defined packages Data with embedded metadata descriptions (e.g. NetCDF, HDF5 file formats) Package MD D MD D 13

EUDAT CDI API Abstraction Layer Community Services (e.g. specific discovery services) EUDAT API library Abstraction Layer GridFTP HTTP DO support - Separate relevant metadata and data objects - As packages - Embedded metadata DO support - HTTP/JSON descriptions - Separate metadata and data objects - As packages - Embedded metadata 14

Metadata Description Support Defined Templates Interpretable for EUDAT Uninterpretable for EUDAT 15

Production Infrastructure Operational Services Central Registry Sites&Services creg.eudat.eu Monitoring cmon.eudat.eu providing PIDs Community Centre/Repository, Data Provider General data centre (many HTC/HPC service providers) Community Data Project Resource Coordination rct.eudat.eu Helpdesk helpdesk.eudat.eu Operational and Support services PID provider (most of them are epic partners and can issue Handle prefixes) 16

X.509 IdP PKI SAML IdP SAML Social IdP Primary Identities Google Facebook Linkedin edugain RIs e.g. CLARIN PRACE EGI WLCG IDM Integration https://b2access.eudat.eu ORCID ResearchID Scopus OpenID OpenID IdPs from RIs Multi-Protocol Identity Management, LoA support powered by Unity IDM e.g. ESGF, ENES OAuth 2 authorization server EUDAT CA EUDAT federation database B2ACCESS AAI functions B2ACCESS IdP User Profile B2SHARE (Oauth 2) Access Token X.509 SAML B2SAFE (X.509) B2STAGE (X.509) B2DROP (SAML) B2HANDLE (SAML) Data Project Coordination Portal Helpdesk TTS Site & Service Registry EUDAT Service Endpoints 17

PKI SAML IdP SAML Social IdP Primary Identities Production Oct 15 B2SHARE (Oauth 2) B2SAFE (X.509) Google B2ACCESS IdP User Profile B2STAGE (X.509) edugain OpenID Multi-Protocol Identity Management powered by Unity IDM OAuth 2 authorization server EUDAT CA EUDAT federation database Access Token X.509 SAML B2DROP (SAML) B2HANDLE (SAML) Data Project Coordination Portal Helpdesk TTS Site/Repository & Service Registry B2ACCESS AAI functions EUDAT Service Endpoints 18

B2ACCESS EUDAT IDM http://b2access.eudat.eu 19

B2ACCESS EUDAT IDM http://b2access.eudat.eu 28/11/2015 20 20

Example: Safe Replication The ideal solution to: eudat.eu/b2safe replicate research data into secure data stores archive and preserve research data in the long-term bring data permanently close to powerful compute resources co-locate data with different communities benefit from economies of scale Features: large-scale storage robust and highly available permanent PIDs 21

B2SAFE Use Case from CLARIN ERIC Replication of Linguistic Data PID PID irods SAMQFS irods irods GPFS dcache HPSS DMF 22

Who? Groups, Communities and Centres who want to make their data referencable in a stable way What? Follows policies to register data and make it long term refer- and citable Reliability through mutual PID mirroring, Handle Prefix Registrars from epic or other DONA MPAs are partners of EUDAT. Provides the abstraction layer between a globally unique persistent identifier and physical location of data objects Machine readable via HTTP RESTful API Why Handles? Stable globally unique IDs, stable cross-links Technology Agnostic Simple Integration Development activity Develop policies for the B2HANDLE service (e.g. PID namespace mngmt) Consolidate the PID record profile for the CDI Define PID Information Types for data, metadata, collection records Integrate with Data Type Registry service Consolidate B2HANDLE API library with EUDAT API library 23

Worldwide PID system, a DNS for Data being built DONA dona.net Representatives from all continents Stewards of the Handle System Worldwide MPAs with RAs IDF, epic, CNRI, etc. A worldwide system to register digital objects via Handles similar to the assignment of FQDNs to compute resources having IP-addresses. Need to be able to identify and test integrity and authenticity of data, relations to meta data, apps, services etc. The Handle System offers a powerful solution. DONA is a foundation under Swiss law to make the Handle System independent from CNRI. Federation of Multi-primary Prefix Administrators (MPA) and Prefix- Registration Authorities are being established. 24

EUDAT Production Environment Data Management Project Enabling Helpdesk & Support Network, Configuration Compute Resources Service Hosting, Service on Demand Service Deployment Storage, Storage Services Security Team Service and Resource Provisioning & Coordination Operational & Central Services 25 14 generic centres, 15PB committed, 5-10Gb/s per site (potential of > 1000 PB aggregated)

Configuration of Handle PIDs as linked list A1