Using EUDAT services to replicate, store, share, and find cultural heritage data

Similar documents
Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT Towards a Collaborative Data Infrastructure

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

The EUDAT Collaborative Data Infrastructure

I data set della ricerca ed il progetto EUDAT

EUDAT Common data infrastructure

EUDAT - Open Data Services for Research

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Data Discovery - Introduction

Data Staging: Moving large amounts of data around, and moving it close to compute resources

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Poland - e-infrastructure ecosystem and relation to EOSC

EUDAT & SeaDataCloud

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

The role of PIONIER network in long term preservation services for cultural heritage institutions in Poland

Fundamentals of Data Infrastructures

A national approach for storage scale-out scenarios based on irods

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

Federated access to e-infrastructures worldwide

PID System for eresearch

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

EUDAT and Cloud Services

EGI federated e-infrastructure, a building block for the Open Science Commons

DCH-RP Trust-Building Report

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Challenges of Big Data Movement in support of the ESA Copernicus program and global research collaborations

irods workflows for the data management in the EUDAT pan-european infrastructure

DCH-RP and PREFORMA Two case studies on the digital preservation of cultural heritage

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

2. HDF AAI Meeting -- Demo Slides

Storage Management in INDIGO

Striving for efficiency

Developing a social science data platform. Ron Dekker Director CESSDA

Promoting Open Standards for Digital Repository. case study examples and challenges

Different Aspects of Digital Preservation

European Open Science Cloud

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

European Grid Infrastructure

GSCB-Workshop on Ground Segment Evolution Strategy

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Open Smart City Open Social Innovation Infrastructure. Cezary Mazurek, Maciej Stroinski Poznan Supercomputing and Networking Center

Progress towards the EOSC

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Open Science Commons: A Participatory Model for the Open Science Cloud

European digital repository certification: the way forward

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Towards a joint service catalogue for e-infrastructure services

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Beyond Status and plans

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The VERCE Architecture for Data-Intensive Seismology

Europeana Core Service Platform

Deliverable DJRA1.1. Use-Cases for Interoperable Cross- Infrastructure AAI

Safe Replication and Data Staging

SCA19 APRP. Update Andrew Howard - Co-Chair APAN APRP Working Group. nci.org.au

First European Globus Community Forum Meeting

ESFRI WORKSHOP ON RIs AND EOSC

DSpace Fedora. Eprints Greenstone. Handle System

B2SAFE metadata management

EGI: Linking digital resources across Eastern Europe for European science and innovation

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Designing an institutional research data management infrastructure for the life sciences

Indiana University Research Technology and the Research Data Alliance

Big Data, exploiter de grands volumes de données

Towards FAIRness: some reflections from an Earth Science perspective

Travelling securely on the Grid to the origin of the Universe

Key Elements of Global Data Infrastructures

InfraStructure for the European Network for Earth System modelling. From «IS-ENES» to IS-ENES2

HPC IN EUROPE. Organisation of public HPC resources

CyberSKA: Project Update. Cameron Kiddle, CyberSKA Technical Coordinator

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality

ehealth Ministerial Conference 2013 Dublin May 2013 Irish Presidency Declaration

INDIGO-DataCloud Architectural Overview

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

High Performance Computing from an EU perspective

INITIATIVE FOR GLOBUS IN EUROPE. Dr. Helmut Heller Leibniz Supercomputing Centre (LRZ) Munich, Germany IGE Project Coordinator

A Simulation Model for Large Scale Distributed Systems

Science-as-a-Service

B2DROP The EUDAT Personal Cloud Storage

Regional e-infrastructures

Transcription:

Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC 3 October 203 DIGITAL PRESERVATION OF CULTURAL DATA WORKSHOP, 2nd EUDAT conference

Plan and purpose of presentation Summarize the work done and planned in EUDAT an DCH-RP project How DCH sector can use e-infrastructures? Undersdand how selected EUDAT services can be used in DCH domain? How EUDAT CDI architecture can be integrated with domain-specific services? Overview of current services (Simple Store, Safe Replication) Presentation of integration work in DCH-RP Discussion on possible future extensions / services 3 October 203 Event 2

Some aspects of EUDAT EUDAT European Data Infrastructure Vision: to support a Collaborative Data Infrastructure Aims: Provide a sustainable platform of technologies, tools, services driven by user needs Engage users in defining/shaping a platform for shared services Support data-intensive, multi-disciplinary research: Humanities and Social Science: CLARIN But also: earth science (ENES, Earth system modelling; EPOS: European Plane Observing System), ecology (LifeWatch), Virtual Physiological Human (VPH) Deliver common low-level services that are required to provide the level of interoperation and trust of data Ensure that the data infrastructure is robust/scalable (able to address data tsumami ) Build community/domain-specific services on top of the common services with participation of users 3 October 203 Event 3

CDI layers Commons vs communityspecific services 3 October 203 Event 4

CDI layers vs tools and services in DCH Integration challenges High-level services: EUDAT simple store eculture Science Gateway dlibra darceo dlab Invenio Another community solution Low-level services: Local LTS: long-term storage EUDAT storage Safe Replication Cloud storage S3 CDMI Grid storage FTS GridFTP

CDI layers vs tools and services in DCH Integration challenges High-level services: EUDAT simple store eculture Science Gateway dlibra darceo dlab Invenio Another community solution DCH-RP <-> EUDAT proof of concept: Do they go together? Low-level services: Local LTS: long-term storage EUDAT storage Safe Replication Cloud storage S3 CDMI Grid storage FTS GridFTP

EUDAT: services for DCH domain Services covered in the presentationis Simple Store service: enable researchers and scientists to upload, store and share date designated for the long tail of data Safe Replication: Allow communities to replicate data to selected EUDAT data centers Automated replication (irods), PID registration (EPIC) Data Staging: Staging data from user community premises/systems (irods) to computing systems, e.g. PRACE s HPC centres (GridFTP, FTS) Metadata service: Joint metadata domain for all EUDAT data centres Searchable catalogue covering all data stored within EUDAT AAI: Provide a solution for a working AAI system in a federated 3 October 203 Event 7

EUDAT for DCH Simple Storage Service Simple Storage Service: Address the issues of small user groups and indivitual users Provides solution for long tail data : often stored on laptops and departmental servers Functionality: alowing registered users to upload typical data objects into the EUDAT store enabling users to share such objects and collections with other researchers, lets utilising other EUDAT services: Safe Replication PIDs etc. May be integrated with AAI More: http://www.eudat.eu/simple-store 3 October 203 Event 8

EUDAT for DCH Simple Storage Service Simple Storage Service internals: Referred also as Researcher Data Store Based on Invenio: Developed by CERN http://invenio-software.org/ Storage backend: Disk irods (EUDAT safe replication) Front-end: Developed a new submission portal to invenio 3 October 203 Event 9

EUDAT for DCH Simple Storage Service Simple Storage Service: 3 October 203 Event 0

EUDAT for DCH Simple Storage Service Simple Storage Service: 3 October 203 Event

EUDAT for DCH Simple Storage Service Simple Storage Service: 3 October 203 Event 2

EUDAT for DCH Simple Storage Service 3 October 203 Event 3

EUDAT for DCH Simple Storage Service 3 October 203 Event 4

EUDAT for DCH Simple Storage Service 3 October 203 Event 5

EUDAT for DCH Simple Storage Service 3 October 203 Event 6

3 October 203 Event 7

EUDAT for DCH Simple Storage Service 3 October 203 Event 8

3 October 203 Event 9

EUDAT for DCH Safe Replication Service Safe Replication Service: Allows communities to replicate data to EUDAT data centers Can be integrated with portals and community tools Functionality: Ingested file/data are: automatically replicated to many data centres get persistent identifiers registered (PIDs based on EPIC) Various interfaces supported irods: icommand, API, WebDAV, GridFTP Can replicate data on top of various different data stores: Disks, tapes, HSMs Clouds (e.g. S3) More: http://www.eudat.eu/safe-replication

EUDAT: Simple Storage Service Safe Replication Service:

Integration option : EUDAT Simple Store + EUDAT Safe Replication Support for sharing Simple Store Service Easy deposit & access to the data Safe Replication Transparent replication of data, persistent storage Data Store Data Store2

Integration option2: Community portal/solution + EUDAT Safe Replication Support for sharing Community service Storage & access typical for community Safe Replication Transparent replication of data Data Store Data Store2

Option in practice: EUDAT Simple store + EUDAT Safe Replication DCH-RP <-> EUDAT PoC Support for sharing Memory institutions: Simple Store Service Easy storage & access to the data EUDAT Simple Store Provided by PSNC Data Store @PSNC Safe Replication Data Store2 @EPCC Transparent replication of data EUDAT Safe replication Provided by PSNC (Poznan) & EPCC (Edinburgh)

Option 2 in practice: Community CMS solution + EUDAT Safe Replication DCH-RP <-> EUDAT PoC 2 Support for sharing Memory institutions: Community service Feature-full domain-specific tool with large user base EUDAT Simple Store Provided by PSNC or EUDAT partners Data Store @PSNC Safe Replication Data Store2 @EPCC Transparent replication of data EUDAT Safe replication Provided by PSNC (Poznan) & EPCC (Edinburgh)

Domain-specific solutions dlibra/darceo/dlab DCH-RP <-> EUDAT PoC 2 http://dlab.psnc.pl/ 2 2 3 3 2 Basic statistics: ± 00 digital libraries several hundreds memory institutions over, M of digital objects 98% content delivered via services based on dlibra software (http://dlibra.psnc.pl/) which uses Solr for content and metadata indexing and searching 4 2 0 5

EUDAT <-> DCH-RP PoC summary We investigate two possible ways to offering data preservation services for DCH Top-down solution for citizen scientists / citizen DCH people Well-established solution backed by EUDAT Safe Replication We exploit the layered EUDAT CDI architecture In theory: It enables integration with existing solutions We try to understand how it works in practice

Planned extensions of simple store Discussion trigger Development roadmap Premium service: Customisation for layout, metadata for community needs increased storage capcity increased support increased bandwidth Premium vs regular service: Providing premium service requires enrolling with EUDAT Regular services to be offered to citizen scientists/users no close relationship needed AAI integration On the roadmap

Possible extensions of simple store Discussion trigger 2 From yesterdays discussion about Simple Store: Thousand of files? Upload reliability / robustness? => Batch upload API to be developed» Enables integration with existing systems» Tools can be offered by EUDAT to support batch upload Collections upload? => Support for meta-data extraction Implemented client-side? E.g. based on pre-prepared collections (e.g. DIPs)

Possible extensions of simple store Implementation: user-side tool? Data Meta-Data Client-side application Data upload control Meta-data extraction and upload Meta-data exchange API Data upload API Simple Store API

Possible extensions of simple store Implementation: user-side tool? Data Meta-Data Higlights: Automation: ease of use, reliability, performance Functionality: data upload, meta-data extraction Client-side application Data upload control Meta-data extraction and upload Meta-data exchange API Data upload API Simple Store API

Possible extensions of simple store Implementation: user-side tool? Data Meta-Data Higlights: Automation: ease of use, reliability, performance Functionality: data upload, meta-data extraction Client-side application Challenges: Portability Universality: standards need to be identified Sustainability Data upload control Meta-data extraction and upload Meta-data exchange API Data upload API Simple Store API

Possible extensions of simple store Implementation: user-side tool? Data Meta-Data Higlights: Automation: ease of use, reliability, performance Functionality: data upload, meta-data extraction Client-side application Challenges: Portability Universality: standards need to be identified Sustainability Discussion needed! Data upload control Meta-data extraction and upload Meta-data exchange API Data upload API Simple Store API

Possible extensions of simple store Implementation: user-side tool?

Extensions of EUDAT services discussion Summary Message: EUDAT infrastructure and services are layered, modular This enables integration Further extensions possible Users are welcome to influence them We want to make sure that we recognised and support necessary standards Technical details / organisation / etc. to be discussed

Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC THANK YOU! 3 October 203 DIGITAL PRESERVATION OF CULTURAL DATA WORKSHOP, 2nd EUDAT conference 36