I data set della ricerca ed il progetto EUDAT

Similar documents
Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT - Open Data Services for Research

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT & SeaDataCloud

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT Common data infrastructure

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT and Cloud Services

Data Discovery - Introduction

The EUDAT Collaborative Data Infrastructure

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

irods workflows for the data management in the EUDAT pan-european infrastructure

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

B2SAFE metadata management

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

EUDAT Towards a Collaborative Data Infrastructure

Scientific Data Curation and the Grid

Towards FAIRness: some reflections from an Earth Science perspective

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Safe Replication and Data Staging

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The Materials Data Facility

TEXT MINING: THE NEXT DATA FRONTIER

Data Staging: Moving large amounts of data around, and moving it close to compute resources

e-infrastructure: objectives and strategy in FP7

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Developing a Research Data Policy

Science Europe Consultation on Research Data Management

Session Two: OAIS Model & Digital Curation Lifecycle Model

GEOSS Data Management Principles: Importance and Implementation

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Data publication and discovery with Globus

e-infrastructures in FP7 INFO DAY - Paris

Scientific data management

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

A VO-friendly, Community-based Authorization Framework

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Requirements for data catalogues within facilities

About Knowledge Convergence. e-infrastructures Austria an interdisciplinary case study concerning research resources and their management

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

Open Language Resources & Meta-Resources: a Treasure and a Challenge for Linked Data

Big Data infrastructure and tools in libraries

How to share research data

Metadata Models for Experimental Science Data Management

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Striving for efficiency

Town Hall Meeting on the National and International Data Infrastructure. Jim Warren, NIST Lisa Friedersdorf, NNCO

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

Managing very large Multimedia Archives and their Integration into Federations

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

An overview of the OAIS and Representation Information

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Educating a New Breed of Data Scientists for Scientific Data Management

Indiana University Research Technology and the Research Data Alliance

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

How to make your data open

Europeana Core Service Platform

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

EGI federated e-infrastructure, a building block for the Open Science Commons

Building the Digital Media Workstream: From project pitch to program purchase

Fundamentals of Data Infrastructures

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Key Elements of Global Data Infrastructures

Copyright 2008, Paul Conway.

PASIG Directions & Issues

DCH-RP Trust-Building Report

Research Data Management & Preservation: A Library Perspective

AAI in EGI Current status

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Towards FAIRness in research data. Per Öster, 3 October 2018

Writing a Data Management Plan A guide for the perplexed

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

Progress towards the EOSC

The Photon and Neutron Data Initiative PaN-data

«Prompting an EOSC in Practice»

The role of e-infrastructure for the preservation of cultural heritage. Federico Ruggieri Head of Distributed Computing and Storage Deparment of GARR

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Transcription:

I data set della ricerca ed il progetto EUDAT Casalecchio di Reno (BO) Via Magnanelli 6/3, 40033 Casalecchio di Reno 051 6171411 www.cineca.it 1

Digital as a Global Priority 2

Focus on research data Square Kilometre Array Large Hadron Collider Human Brain Project ESRF & ILL, Grenoble Daresbury Laboratory 3

G8+O5 working group and GSO Report Recommendations 9 & 10 Recommendation 9 - e-infrastructure Global research infrastructure initiatives should recognize the utility of the integrated use of advanced e-infrastructures, services for accessing and processing, and curating data, as well as remote participation (interaction) and access to scientific experiments. Recommendation 10 - exchange Global scientific data infrastructure providers and users should recognise the utility of data exchange and interoperability of data across disciplines and national boundaries as a means to broadening the scientific reach of individual data sets. 4

G8 + O5 White Paper Draft v1.0 5 Principles for an Open Infrastructure 1. Discoverability Implementation of appropriate persistent identifier frameworks, adoption of descriptive metadata standards, and the use of appropriate data formats and taxonomies. 2. Accessibility Publicly funded scientific research data should be made openly available with as few restrictions as possible. Ethical, legal or commercial constraints may be imposed on the use of research data to ensure that the research process is not damaged by inappropriate release of data. 3. Understandability Scientific data sets must be understandable in order to be effectively used. A set of numbers, texts, pictures or even videos alone cannot be understandable without additional context, semantics, data analysis tools, and algorithms. 4. Manageability management policies and plans must make it clear who is responsible for maintaining the availability of data and how the associated costs are to be met including issues associated with curation, storage and services. 5. People A global approach to research data infrastructure requires a highly skilled and adaptable workforce and culture that is able to capture the available data and make it available to those that are able to use it appropriately. 5

Sharing Vision Single Infrastructure à Single User Experience Raw Analysed Publica@on Different Infrastructures Analysis à Different User Experiences Catalogue Catalogue Catalogue Publica@ons Catalogue Raw Analysis Analysed Publica@on Publica@ons Experiment Raw Analysis Analysed Publica@on Publica@ons Observa@on Raw Analysis Analysed Publica@on Publica@ons Simula@on Capacity SoAware Publica@ons Storage Repositories Repositories Repositories 6

EUDAT project EUDAT is a three-year project that will deliver a Collaborative Infrastructure (CDI) Co-funded within the 7th Framework Programme Launched on 1 October 2011 7

EUDAT Collaborative Infrastructure Analysed Catalogue Publica@on Catalogue Generators Analysed Users Publica@on User functionalities, data capture & transfer, virtual research environments Curation Trust Analysed Community Support Services Publica@on discovery & navigation, workflow generation, annotation, interpretability Analysed Common Services Repositories Publica@on Persistent storage, identification, authenticity, workflow execution, mining 8

EUDAT Principles 1. deposited with the EUDAT CDI will be preserved in perpetuity 2. are best curated in their own communities 3. Access to data in the EUDAT CDI is free at the point of use 4. EUDAT will operate as a federation of community-facing repositories and back office hosting providers 5. EUDAT services and infrastructure must be a suitable target for TDR outsourcing (cf. datasealofapproval.org) 6. EUDAT will not assert ownership of any data it holds 9

Persistent Identifiers (PID) EUDAT relies on the EPIC service to associate persistent identifier to digital objects EPIC is an identifier system using the Handle infrastructure. Its focus is the registration of data in an early state of the scientific process, where lots of data is generated and has to become referable to collaborate with other scientific groups or communities, but it is still unclear, which small part of the data should be available for a long time period. Handle resolution 10

First Services in Preparation Metadata Catalogue AAI PID Aggregated EUDAT metadata domain. inventory Safe Replication preservation and access optimization Staging Dynamic replication to HPC workspace for processing Simple Store Researcher data store (simple upload, share and access) Network of trust among authentication and authorization actors Anchor for identifica -tion and integrity PID Metadata 11

Community Centres 12

EUDAT-EPOS success story The whole archive of the INGV s data center in Rome is replicated to CINECA, the EUDAT s data center in Bologna ( years:1990-2013 25 TB ) Granularity is at single file level (the collection of sesimic waveforms related to 1 sensor for 1 day) Metadata are replicated too, as files (dataless). The replicas are stored, synchronized and accessed by a single user, the INGV community manager. 13

Safe Replication Goals Relevant data needs to be replicated from community centers to a number of data centers in a safe way with several purposes in mind: data bit-stream preservation more optimal data curation better accessibility of data identification of data through Persistent Identifiers (PIDs) Common functionality: Create M replicas (identified by a PID record) at different data centers for N years, exclude centers X, maintaining the given access permissions 14

EUDAT domain data enrichment processing reduction domain of globally referenceable data analysis temporary data global registration registration referenceable data citable publication data acquisition generation description EPIC PID handle preservation 15

SmartDATA project CINECA s project to deal with Big and HPC: a Big Analytics engine for OpenDATA catalogues Initial pre-conditions: are archived, catalogued, searchable and accessible: - Robust and reliable infrastructure - Interoperability and adoption of standards Smart satisfies these requirements and besides: make available an HPC environment with analysis tools which foster the re-use of such data. ease the information exchange among experts belonging to different disciplines, from both the point of view of the methods and of the data. à Hence the data sharing creates new knowledge 16

SmartDATA project SmartDATA Portal and apps SmartDATA users orchestrator HPC Front-End Job Scheduler workflows ingestion Registry (irods) for OpenDATA sources repository Scratch staging Supercomputer analytic engine 17

EUHIT project EuHIT is a consortium that aims at integrating cutting-edge European facilities for turbulence research across national boundaries (2013-2017). It includes 25 research institutes and 2 industrial partners from 10 European countries. A total of 14 cutting-edge turbulence research infrastructures CINECA will provide the technical and hardware infrastructure for the turbulence data digital library service (DLTD), offering: A service to manage data coming from numerical simulations and experiments in the field of fluid dynamics. A storage space of 150 TB online. A helpdesk and specialized support. 18

EUHIT project TurBase, a freely accessible living knowledgebase for high quality turbulence data, will rely on the CINECA s DLTD service, which will be implemented using the same technology adopted by EUDAT to foster interoperability and accessibility 19

Human Brain project The Human Brain Project (HBP) is a large scale European research project which aims to simulate the world's first and most exact human brain in a supercomputer. Working together with 86 different European institutions the project is planned to last ten years (2013-2023). To image an entire brain many parallel stacks of images are acquired. They are afterwards merged with a custom-made algorithm suited to work with very large data sets (~ 1 TB) 20

Human Brain project WP7.5 High Performance Computing Platform: integration and operations CINECA is responsible for Task 7.5.4 The HBP supercomputer for massive data analytics: Implement and operate a data-centric HPC facility providing efficient storage, processing and management of large volumes of data generated by the HBP. The HBP Massive Analytics Supercomputer in CINECA will provide 2 Petaflop/ s of peak performance and 200 Terabytes of main memory, integrated with a mass storage facility of more than 5 Petabytes of working space with an aggregated file system bandwidth of about 70 GB/sec in a full production environment. The system will be integrated with a Facility of around 5 Petabytes for on-line disk storage repository and more 10 Petabytes for long term data archiving, providing efficient data life-cycle management of structured and un-structured data generated by the HBP. 21

CINECA Repository service The service is implemented through irods (integrated Rule-Oriented System - https://www.irods.org). It provides the following functionalities: Up/Download: the system supports high performance transfer protocols like GridFTP, or irods multi-threads transfer mechanism, and large interoperable ones like HTTP. Metadata management: each object can be associated to specific metadata represented as triplets (name,value,unit), or simply tagged and commented. Long Term archiving: the long-term accessibility is granted by means of a seamless archiving process, which is able to move the collections of data from the on-line storage space to a tape based off-line space and back, according to general or per-project policies. Stage-in/stage-out: the service is enabled to move data sets, requested as input for computations, towards the HPC machines' local storage space, commonly named scratch, and backwards as soon as the results are available Sharing: the capability is implemented via a unix-like ownership model. Moreover a ticket based approach is used to provide temporary access tokens with limited rights. Searching: the data are indexed and the searches can be based on the object locations or on the associated metadata. 22

Thanks for your attention! Questions? 23