EUDAT Towards a Collaborative Data Infrastructure

Similar documents
EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

EUDAT- Towards a Global Collaborative Data Infrastructure

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Common data infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT - Open Data Services for Research

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

The EUDAT Collaborative Data Infrastructure

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Towards a joint service catalogue for e-infrastructure services

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Progress towards the EOSC

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

I data set della ricerca ed il progetto EUDAT

European Open Science Cloud

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

The OpenAIREplus Project

Striving for efficiency

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Research Infrastructures in Horizon 2020

Developing a social science data platform. Ron Dekker Director CESSDA

EUDAT & SeaDataCloud

ESFRI WORKSHOP ON RIs AND EOSC

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

in Horizon 2020 Philippe Froissard European Commission DG Research & Innovation Research Infrastructures

Open Science Commons: A Participatory Model for the Open Science Cloud

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

DOIs for Research Data

Key Elements of Global Data Infrastructures

Data Discovery - Introduction

AARC. Christos Kanellopoulos AARC Architecture WP Leader GRNET. Authentication and Authorisation for Research and Collaboration

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Data management and discovery

DARIAH-AAI. DASISH AAI Meeting. Nijmegen, March 9th,

European digital repository certification: the way forward

Towards repository interoperability

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenAIRE Open Knowledge Infrastructure for Europe

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

Indiana University Research Technology and the Research Data Alliance

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Towards FAIRness: some reflections from an Earth Science perspective

e-infrastructures in FP7: Call 7 (WP 2010)

Research Infrastructures for Robotics

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

ESFRI Strategic Roadmap & RI Long-term sustainability an EC overview

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

Caring for research data and what about software? Peter Doorn, director DANS

e-infrastructure for the 21st Century - one year later

Petaflop Computing in the European HPC Ecosystem

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Poland - e-infrastructure ecosystem and relation to EOSC

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Greek e-infrastructures Short report

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

High Performance Computing from an EU perspective

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

e-infrastructure: objectives and strategy in FP7

Guide to e-infrastructure Requirements for European Research Infrastructures

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

Federated Identities and Services: the CHAIN-REDS vision

Overview of the European Open Science Cloud. Jan Wiebelitz e-irg support programme

DANUBIUS-RI, the pan-european distributed research infrastructure supporting interdisciplinary research on river-sea systems

EPOS: European Plate Observing System

For Attribution: Developing Data Attribution and Citation Practices and Standards

The EOSC A personal vision (supported by some facts)

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

PID System for eresearch

PROJECT FINAL REPORT. Tel: Fax:

Safe Replication and Data Staging

Options for Joining edugain. Lukas Hämmerle, SWITCH DARIAH Workshop, Köln 18 October 2013

irods workflows for the data management in the EUDAT pan-european infrastructure

EGI federated e-infrastructure, a building block for the Open Science Commons

OpenAIRE From Pilot to Service

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

INTAROS Integrated Arctic Observation System

OpenAire and BASE. Services supporting the Interoperability of the European Open Science Network. Lyon,

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Transcription:

EUDAT Towards a Collaborative Data Infrastructure Daan Broeder - MPI for Psycholinguistics - EUDAT - CLARIN - DASISH Bielefeld 10 th International Conference

Data These days it is so very easy to create data but still far less easy to manage it. Experiment data Sensor produced data Simulations Digital libraries The Web How to store, to administrate, to find, to enrich, to link, to process, to share, to reuse,, to publish For this we need a data infrastructure One that is efficient, sustainable and cost effective

Data creation cycle analysis & enrichment temp. data referable data citable data Citable publication raw data registration & preservation

The current data infrastructure landscape Long history of data management in Europe: several existing data infrastructures dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN) New Research Infrastructures (ESFRI roadmap) are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.) However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community Challenges Compatibility, interoperability, for cross-disciplinary research Data growth in volume and complexity strong impact on costs threatening the sustainability of the infrastructure Opportunities Synergies do exist: although disciplines have different work flows and ambitions, they have common basic needs and requirements that can be matched with generic services supporting multiple communities Strategy needed at pan-european level 4

Collaborative Data Infrastructure 5

EUDAT short fact list Content Project Name EUDAT European Data Start date 1st October 2011 Duration Budget 36 months 16,3 M (including 9,3 M from the EC) EC call Call 9 (INFRA-2011-1.2.2): Data infrastructure for e-science (11.2010) Participants 25 partners from 13 countries (national data enters, technology providers, research communities, and funding agencies) Objectives To deliver cost-efficient and high quality Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers needs in a flexible and sustainable way, across geographical and disciplinary boundaries. 6

Consortium 7

Research Communities

Research fields Environmental Science Social Sciences and Humanities ENES, EPOS, Lifewatch, EMSO, IAGOS-ERI, ICOS, Euro-Argo CLARIN, CESSDA, DARIAH Biological and Medical Science VPH, ELIXIR, BBRMI, ECRIN, DiXA Physical Sciences and Engineering WLCG, ISIS, PanData Material Science ESS EUDAT targets all scientific disciplines (discipline neutral): To enable the capture and identify cross-discipline requirements To involving the scientists of all the communities in the shaping of the infrastructure and its services

EUDAT service design activities 1. Capturing Communities Requirements (WP4) 1st round of interviews with the five initial communities (Oct.2011 - Dec. 2012) Understand how data is organised in each community Collect first wishes and specific requirements from a common data service layer Next phase: refine analysis and expanding it to other communities 2. Building the corresponding services (WP5) Technology appraisal (ongoing) What is already available at partners s sites to build the corresponding services? What are the gaps and market failures that should be addressed by EUDAT? Next phase: Developing candidate services Adapt services to match the requirements Integrate with community and SP services Test and evaluate with communities 3. Deploying the services and operating the federated infrastructure (WP6) Designing the federated infrastructure and the interfaces for cross-site operations (ongoing) Next phase: integrating and coordinating resource provision, operations and support

EUDAT Core Service Areas Community-oriented services Simple Data Acces and upload Long term preservation Shared workspaces Execution and workflow Joint metadata and data visibility Simple storage facility for individual scientists and small projects Core services are building blocks of EUDAT s Common Data Infrastructure mainly included on bottom layer of data services Enabling services (making use of existing services where possible Persistent identifier service (EPIC, DataCite,...) Federated AAI service (NRENs, edugain) Network Services Monitoring and accounting

Data Management Service Cases Safe Replication Replicate data between selected centers Based on user specified policies For LTA, for easy access, Technology: irods Dynamic Replication (Data staging) Moving data to HPC workspaces and storing the results Technology: irods + grid tools Usable PID framework facilitate administrating data replication allow identifying parts of objects data verifiability, Technology: HS + EPIC and DataCite Center registry Listing EUDAT services, centers and their capabilities

Data Management Service Cases Joint metadata domain A metadata catalogue for (all?) research data Interdisciplinary (re-)use of data Semantic interoperability: explicit semantics and flexible relations or hard-wired mappings,.. Granularity Include individual resources or data-sets only Commenting function Platform permitting data-set promotion Proper acknowledgements for data creators Technologies: icat, mercury, OAI-PMH, xsd, rdf, Simple Store A safe repository for all research data in need youtube or dropbox model (Detailed?) metadata Sharing

EUDAT Architecture EUDAT Community center EUDAT data center EUDAT data center EUDAT data center PRACE HPC center HPC workspace EUDAT Community center D EUDAT PID Service LTA facility EUDAT HPC center D HPC workspace D D D D D LTA facility EUDAT Metadata Service Harvesting metadata EUDAT center registry EUDAT Simple -store D

Collaborations With the ESFRI (cluster) projects With service providers: EPIC, DataCite, EUDAT <-> EGI collaboration (& competition) US DataNET: DataOne, Data Conservancy, DAITF - Data Access & Interoperability Task Force This task will contribute to the efforts to establish an international task force. This work will be carried out in collaboration with OpenAIRE and other relevant initiatives/projects focusing on data.

Thank you for your attention

Interlinking data and publications Identifiers for Actors (ORCID) data curator data depositor reviewer author editor API API datasets & metadata publications Identifiers for data & publications (HS, DOI, URN)

Organizations guiding data management infrastructure building ICSU CODATA WDC COAR EUDAT ICORDI OpenAIRE DAITF

top-down process about strategies and needs driven by science bottom-up process towards solutions driven by science Move to DAITF & icordi inspired by OpenAIRE and EUDAT NSF EC icordi PROGRAMMES ANALYSIS PROGRAMME DAITF STEERING BOARD CNRS Horizontal Data Infrastructures KNAW Informing DOMAIN OF TOP SCIENTISTS, SENIOR TECHNOLOGISTS, POLICY MAKERS MPG HLSF PROCESS Workshops, working groups Influencing Interacting DAITF PROCESS Conferences, working groups, hands-on training Data Scientists Young Scientists Technologists CNR DFG Informing DOMAIN OF DATA INFRASTRUCTURE PRACTITIONERS NWO STFC icordi PROGRAMMES KNOWLEDGE EXCHANGE PROGRAMME WORKSHOP PROGRAMME PROTOTYPE PROGRAMME Discipline/Domain Data Infrastructures 1st Workshop March 2012, Copenhagen next workshop in October, Washington other stakeholders RCs, ROs, Funders, etc how to organize and support this process? IETF? DWF?

What has been done so far? 2006/8 UIPIU Data2012 Workshop DataNet1 DAITF Prepar. Workshop ASIST Workshop DataNet2 DWF Concept 2008 2009 2010 2011 2012 tackling first data topics brainstorming on data issues, need for global action & first focussed actions global interaction in place 20

EUDAT Services for Communities!? v v DARIAH v DiXa v v v v v v v

Safe Replication Use Case Objective: Allow communities to reliably replicate data to selected data centers for storage and do this in a robust, reliable and highly available manner. Respecting existing conventions on stewardship and security. Using user defined policies: e.g. make 4 copies, don t copy to the UK, Application: To (1) move data to locations where curation and/or LTP services are present (2) processing requiring HPC can take place (3) for improved user data accessibility Replicated digital objects are identified through a single PID, with multiple locations associated to the PID record; one location per copy. 22

Dynamic Replication Service Case Move entire data set (i.e. data collection) back and forth between an EUDAT node and a non-eudat node: PRACE or EGI facilities Keep the data replicas at the non-eudat nodes in sync with the EUDAT nodes Ingest/register relevant simulation results back at the EUDAT nodes. Candidate technologies irods Globus on-line FTS Unicore FTP gtransfer

community specific DASISH CLARIN LT web service infrastructure SSH communities wide - DASISH common SSH metadata catalog replication & preservation ENVRI Data Replication & Preservation, Publication EUDAT HPC, GRID services PRACE, EGI NETWORK Services - GEANT CLARIN DARIAH CESSDA Life Watch Federated Identity Management