EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Similar documents
Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT Common data infrastructure

Data Discovery - Introduction

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Data management and discovery

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Using EUDAT services to replicate, store, share, and find cultural heritage data

I data set della ricerca ed il progetto EUDAT

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT - Open Data Services for Research

EUDAT Towards a Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Safe Replication and Data Staging

The EUDAT Collaborative Data Infrastructure

irods workflows for the data management in the EUDAT pan-european infrastructure

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Data Staging: Moving large amounts of data around, and moving it close to compute resources

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Data Staging: Moving large amounts of data around, and moving it close to compute resources

EUDAT and Cloud Services

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

EUDAT & SeaDataCloud

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

B2SAFE metadata management

Europeana Core Service Platform

Open Science Commons: A Participatory Model for the Open Science Cloud

EGI federated e-infrastructure, a building block for the Open Science Commons

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Striving for efficiency

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Webinar Annotate data in the EUDAT CDI

«Prompting an EOSC in Practice»

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Fundamentals of Data Infrastructures

European Open Science Cloud

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

EuroHPC and the European HPC Strategy HPC User Forum September 4-6, 2018 Dearborn, Michigan, USA

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Indiana University Research Technology and the Research Data Alliance

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

Data publication and discovery with Globus

Digital Library Interoperability. Europeana

Key Elements of Global Data Infrastructures

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Introduction

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

The Materials Data Facility

Ways to improve Global Data Sharing and Re-use

Overview of the European Open Science Cloud. Jan Wiebelitz e-irg support programme

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

EuroHPC Bologna 23 Marzo Gabriella Scipione

Scientific Data Curation and the Grid

B2FIND and Metadata Quality

Europeana DSI 2 Access to Digital Resources of European Heritage

The DART-Europe E-theses Portal

Towards a Long Term Research Agenda for Digital Library Research. Yannis Ioannidis University of Athens

PID System for eresearch

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

The National Digital Library Finna Among Digital Research Infrastructures in Finland

Conferenza GARR Moving Forward to deliver an «EOSC in Practice» by 2018

Developing a social science data platform. Ron Dekker Director CESSDA

A national approach for storage scale-out scenarios based on irods

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

ESOC s Successes, Complications and Opportunities in using Cloud Computing and Big Data Technology

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,

DCH-RP Trust-Building Report

On the way to Language Resources sharing: principles, challenges, solutions

bwsync&share: A cloud solution for academia in the state of Baden-Württemberg

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Networking European Digital Repositories

The EuroHPC strategic initiative

Designing an institutional research data management infrastructure for the life sciences

Graph-based Data Integration in EUDAT Data Infrastructure

Poland - e-infrastructure ecosystem and relation to EOSC

Robin Wilson Director. Digital Identifiers Metadata Services

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

e-infrastructures in FP7 INFO DAY - Paris

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

EPOS a long term integration plan of research infrastructures for solid Earth Science in Europe

Transcription:

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Agenda Background information Services Common Data Infrastructure

Setting the scene regular big data - easy to manage (but real-time streams) - lots of automatic processing - high reduction as goal irregular big data - automatically derived data - crowd sourcing changes the rules size number of objects long tail data all the same for industry, - difficult to manage government, public services, - lots of relations citizens, etc.

big scientific data > the data fabric great changes referable data citable data citable publications

EUDAT: Vision & Architecture EUDAT began with the concept of the Collaborative Data Infrastructure See Riding the Wave (High Level Expert Group on Scientific Data, Final Report, 2010) This identified a handful of core Service Cases See http://www.eudat.eu/services-and-technologies And the implementation of the Service Cases led to our current distributed Architecture See later

EUDAT s mission: common services in CDI diagram taken from EC s HLEG report Riding the Wave need experts close to the CLARIN, LifeWatch, ENES, communities knowing the EPOS, VPH, INFC etc. methods, traditions and 6 Core Infrastructures cultures about 20 infrastructures (research infrastructures) 12 EUDAT data centers and/or crossdisciplinary initiatives

What is the EUDAT CDI? The EUDAT Collaborative Data Infrastructure is a pan-european, cross-disciplinary domain of research data for both big community researchers and long tail scientists where data are registered, preserved, accessible and made re-usable

Pan-European What does this mean? Fundamentally, a wide-area distributed architecture Cross-disciplinary Five core stakeholder communities, many other interested; many sources of conflicting requirements! Including simplified services to encourage the long tail to participate All implies a significant systems integration challenge!

What does this mean? (2) Registered means EUDAT data are Globally identified and discoverable (the PID Service) Preserved means EUDAT data are Stored at big European HPC and data centres Replicated for safety (the Safe Replication Service) Governed by policy rules (the Policy Management Service)

What does this mean? (3) Accessible means EUDAT data are Identifiable and findable (the PID Service) Retrievable efficiently (the Data Staging Service) Governed by suitable access control (the AAI Service) Re-usable means EUDAT data are Findable (the PID Service) Comprehensible (the Joint Metadata Service) Composable and combinable (future workflow and computational services)

What does this mean? (4) For both big communities and long tail means Stable, web-service APIs for existing tool-stacks to use (the Common Service Layer Interface) Low barriers to use (the Simple Store Service) Hence the core EUDAT service cases Identifying solutions for these cases that work with our stakeholder communities existing solutions led us to the current CDI architecture

common services EUDAT is working on Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Safe Replication Data curation and access optimization Various flavors Data Staging Dynamic replication to HPC workspace for processing Simple Store Researcher data store (simple upload, share and access) AAI Network of trust among authentication and authorization actors PID Identity Integrity Authenticity Locations services to come EUDAT Box dropbox-like service easy sharing local synching Dynamic Data immediate handling Semantic Anno checking & referencing

Safe Replication Service Robust, safe and highly available data replication service for small- and medium- sized repositories To guard against data loss in long-term archiving and preservation To optimize access for user from different regions To bring data closer to powerful computers for compute-intensive analysis PIDs Policy rules EUDAT CDI Domain of registered data http://eudat.eu/safe-replication 2dn EUDAT Conference, Training Session eudat-safereplication@postit.csc.fi - Rome 28th October

Safe Replication Service (B2SAFE) PIDs Policy rules irods communities/departments do not have IT people can t install irods can t adapt their repository solution are partly using ready-made solutions (Fedora, D-SPACE, WikiMedia, etc.) don t know how to register PIDs etc. not as easy as thought so what to do? created/are creating various flavors FULL SR via irods policies ready Light SR via GridFTP client ready Lighter SR via HTTP client in p Packages for Fedora ready D-SPACE in p WikiMedia tbd? http://eudat.eu/safe-replication 2dn EUDAT Conference, Training Session eudat-safereplication@postit.csc.fi - Rome 28th October

Data Staging Service Support researchers in transferring large data collections from EUDAT storage to HPC facilities Reliable, efficient, and easy-to-use tools to manage data transfers Provide the means to reingest computational results back into the EUDAT infrastructure not a simple service! politics involved (access to HPC) EUDAT CDI Domain of registered data PRACE HPC HPC http://eudat.eu/datastaging 2dn EUDAT Conference, Training Session eudat-datastaging@postit.csc.fi - Rome 28th October

Simple Store Service Allow registered users to upload long tail data into the EUDAT store Enable sharing objects and collections with other researchers Utilise other EUDAT services to provide reliability Simple upload Simple metadata much competition PID registration EUDAT CDI Domain of registered data see it as complementary finally it is about trust http://eudat.eu/simplestore 2dn EUDAT Conference, Training Session eudat-simplestore@postit.csc.fi - Rome 28th October

Metadata Service Easily find collections of scientific data generated either by various communities or via EUDAT services Access those data collections through the given references in the metadata to the relevant data stores Europeana of scientific data how to offer metadata in a cross-disciplinary space? scalability issue? EUDAT CDI Domain of registered data http://eudat.eu/metadata eudat-metadata@postit.csc.fi

EUDAT Box Service some similarity to SimpleStore of course just similar to Dropbox incl. load balancing and replication there is no metadata just data how to integrate into registered domain of data? PID registration synchronization much competition EUDAT CDI Domain of registered data see it as complementary finally it is about trust

Semantic Annotation Service acts as a plugin component to be executed before uploading a resource with tags (crowd sourcing etc.) check tags against Knowledge Source & correct/refer/etc. could be used as trigger in Simple Store plugin available to everyone not center dependent check&annotate PID registration EUDAT CDI Domain of registered data

Service Targeting Replication: targeted at data managers/archivists/ projects/departments without facilities Data Staging: same plus easy access to HPC SimpleStore: place for individuals/projects/groups to store & exchange data EUBox: share data via synchronization Metadata: EUDAT data & everyone interested SemAnn: individual/projects working with massive amounts of human created data data stored in domain of registered data is not EUDAT s data!

COMMON DATA INFRASTRUCTURE

The CDI network architecture The CDI is a connected network of European research institutions and data centres (collectively Nodes) each offering one or more common EUDAT data services to both participating research communities and independent researchers Data centre Nodes have lots of connections Research community Nodes need only one Connections have both technical & policy agreement aspects

The CDI network architecture Generic data centres Community data sites

CDI Node architecture Nodes run parts of the CDI Node software suite, depending on which services they want to offer All Nodes should offer Safe Replication and PID This is really what being in the CDI is all about Others are optional Depends on what a Node s expected user base requires (Some data centre Nodes also need to run the Operational Services suite)

CDI service development timeline Year EUDAT Services CDI 2012 Safe Replication v1, Data Staging v1, PID v1, Operational Services v1; AAI (design) 2013 AAI v1, Joint Metadata v1, Simple Store Service v1, Safe Replication v2, Data Staging v2, Operational Services v2; Common Service Layer Interface (design), workflow and computation services (design) 2014 AAI v2, Joint Metadata v2, Simple Store Service v2, Data Staging v3, Operational Services v3, CSLI v1, computational and workflow services v1 V 1 V 2 V 3 2015 + PID v2, Safe Replication v3, computational and workflow services v2 V 4

Data Staging (DS v1) CSLI CSLI CSLI PID v1 Handle PID registration (PID v1) SR v1, OP v1 SR v1, OP v1 Community s/w SR v1 Operational services (OP v1) Safe Replication (SR v1) CDI V1 (2012)

+ Web (DS v2) AAI v1 Simple Store (SSS v1) CSLI CSLI CSLI JMD v1 (UI) SSS v1 (UI) PID v1 JMD v1 (store) SSS v1 (store) SR v2, OP v2 SR v2, OP v2 Community s/w SR v2 Joint Metadata (JMD v1) + Policy Management (SR v2) CDI V2 (2013)

Data Staging (DS v3) CSLI + Federation AAI v2 CSLI Workflow, Compute v1 CSLI JMD v2 (UI) Simple Store (SSS v2) SSS v2 (UI) PID v1 JMD v2 (store) SSS v2 (store) SR v2, OPv3 SR v2, OPv3 Community s/w SR v2 Joint Metadata (JMD v2) CDI V3 (2014)

Workflow Coordination (WC v1) CSLI WC CSLI + Generalisation (computational v2) CSLI JMD v2 (UI) SSS v2 (UI) PID v2 Enhanced Handle registration (PID v2) JMD v2 (store) SSS v2 (store) SR v3 SR v3, OPv4 SR v3, OPv4 Community s/w + Real Time Data (SR v3) CDI V4 (2015+)

Current Node software suite Safe Replication irods v3.x + EUDAT replication microservices + GSI (X.509) security PID EPIC/Handle system (external service) + EUDAT EPIC client microservices for irods Data Staging GridFTP irods DSI + EUDAT Data Staging script Joint Metadata CKAN + Apache Solr + OAI-PMH Simple Store Invenio + EUDAT faceted user interface layer

Community s/w Using the CDI Joining vs Using the CDI CSLI CSLI CSLI JMD (UI) SSS (UI) PID JMD SSS (store) EUDAT core EUDAT core Community s/w EUDAT core Joining the CDI EUDAT CDI

Many thanks for your attention! QUESTIONS?