EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Similar documents
EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Towards a Collaborative Data Infrastructure

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

EUDAT - Open Data Services for Research

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT Common data infrastructure

The EUDAT Collaborative Data Infrastructure

EUDAT- Towards a Global Collaborative Data Infrastructure

Data Discovery - Introduction

I data set della ricerca ed il progetto EUDAT

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Data Staging: Moving large amounts of data around, and moving it close to compute resources

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

EUDAT & SeaDataCloud

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

PID System for eresearch

EUDAT and Cloud Services

irods workflows for the data management in the EUDAT pan-european infrastructure

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

Safe Replication and Data Staging

The AAF - Supporting Greener Collaboration

2. HDF AAI Meeting -- Demo Slides

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Towards a joint service catalogue for e-infrastructure services

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

B2SAFE metadata management

Federated Services and Data Management in PRACE

1. General requirements

INDIGO AAI An overview and status update!

European digital repository certification: the way forward

GN4-1 SA8 Real Time Applications and Multimedia Management. Networks Services People 1

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

e-infrastructure: objectives and strategy in FP7

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Developing a social science data platform. Ron Dekker Director CESSDA

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Easy Access to Grid Infrastructures

Management der Virtuellen Organisation DARIAH im Rahmen von Shibboleth- basierten Föderationen. 58. DFN- Betriebstagung, Berlin, 12.3.

FeduShare Update. AuthNZ the SAML way for VOs

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Indiana University Research Technology and the Research Data Alliance

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,

Key Elements of Global Data Infrastructures

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

ASTRONOMY & PARTICLE PHYSICS CLUSTER

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

GÉANT Community Programme

CEF e-invoicing. Presentation to the European Multi- Stakeholder Forum on e-invoicing. DIGIT Directorate-General for Informatics.

EUDAT Registry Overview for SAF (26/04/2012) John kennedy, Tatyana Khan

Sustainability in Federated Identity Services - Global and Local

The EGI AAI CheckIn Service

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

Dataset Documentation Reference Guide for Pure Users

DCH-RP Trust-Building Report

Session Two: OAIS Model & Digital Curation Lifecycle Model

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

HPC IN EUROPE. Organisation of public HPC resources

Promoting Open Standards for Digital Repository. case study examples and challenges

SAML-Based SSO Solution

Best Practices: Authentication & Authorization Infrastructure. Massimo Benini HPCAC - April,

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

Set-up of the Testbed for Authentication, Authorization, Accounting

Striving for efficiency

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

Nuno Freire National Library of Portugal Lisbon, Portugal

BPMN Processes for machine-actionable DMPs

Open Science Commons: A Participatory Model for the Open Science Cloud

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Checklist and guidance for a Data Management Plan, v1.0

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Federated Identities and Services: the CHAIN-REDS vision

Scientific data management

Progress towards the EOSC

Deliverable DJRA1.1. Use-Cases for Interoperable Cross- Infrastructure AAI

Towards repository interoperability

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

e-infrastructures in FP7 INFO DAY - Paris

AARC Overview. Licia Florio, David Groep. 21 Jan presented by David Groep, Nikhef.

ESFRI WORKSHOP ON RIs AND EOSC

Transcription:

EUDAT Towards a Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Big (Chaotic) Data DATA GENERATORS 1) Measurement technology. 2) Cheap CPU and storage. 3) Analog to digital conversions Lot of data Unorganized library, where books are written with disappearing ink. PROBLEM FEM Gap_Size BETA_pm025 GAP_025 SH1_pmm SH1_ooo SC_pp.. Lot of directories Data exist, but. 2

3

EUDAT Consortium 4

Data centers and Communities 5

6

7

8

9

10

Communities and Data Centers What are the basic requirements? Which common services are needed? 11

The Concept 12

How Do We Achieve This? Capturing Community Requirements How data is organised What are the first wishes (Community Interviews) Technology Appraisal and Service building What services can be built to match the requirements (Service Taskforces) Service Deployment and Operations How to deploy services on the distributed infrastructure (Operations team) 13

1/6 SAFE_REPLICATION@EUDAT Objective: Enable communities to easily replicate data to selected data centers for storage in a robust and reliable manner. Key benefits: data bit stream preservation, better accessibility Description: Data replication management based on users requirements and constraints; data replication solutions and services embedded into critical security policies, including firewall setups and user accounting procedures. Technology: irods to be used as an initial replication middleware, implemented across the community centers and data centers; as more user communities join the task force, other storage technologies may be added, depending on user needs. Production setup expected by 2013, such that users will be able to safely replicate data across different user community centres and data centres. More info: eudat-safereplication@postit.csc.fi 15

2/6 DATA_STAGING@EUDAT Objective: Enable communities to perform (HPC) computations on the replicated data Key benefits: Access to large computing facilities CINECA EPOS Description: This service will allow the EUDAT communities to dynamically replicate subsets of their data stored in EUDAT to HPC machine workspaces for processing. Differences with the safe replication scenario: replicated data are discarded when the analysis application ends; Persistent Identifier (PID) references are not applied to replicated data into HPC workspaces; Users initiate the process of replicating data while in the safe replication scenario data are replicated automatically on a policy basis. PRACE HPC Facility HPC Facility 2 2 3 PID EUDAT Storage 3 EUDAT Storage 1 SARA 4 Community Storage HPC Facility Technologies: GridFTP, Griffin, gtransfer, FTS (under appraisal) More info: eudat-datastaging@postit.csc.fi 16

3/6 METADATA@EUDAT Objective: Create a joint metadata domain for all data stored by EUDAT data centers and a catalogue which exposes the data stored within EUDAT, allowing data searches. Key benefits: Advertising platform for data sets, metadata service for less mature communities Description: EUDAT will handle metadata for more resources than just those deposited within the EUDAT CDI. In the initial phase we will target mainly resources contributed by the participating communities augmented with those of interested well-organized communities that are ready to contribute. Then, later, other interested communities can be approached depending on the respective community capabilities. Technology: OAI-PMH and embeds domain specific metadata, as XML, within the OAI-PMH record More info: eudat-metadata@postit.csc.fi 17

4/6 SIMPLE_STORE@EUDAT Objective: create an easy-to-use service that will enable researchers and scientists to upload, store and share data that are not part of the officially-managed data sets of the research communities. Key benefits: Store, share, and retrieve smaller sets of data not officially handled. Description: This service will address the long tail of "small" data, and the researchers/citizen scientists creating and manipulating it. Typically this type of data comes in a wide range of formats including text, spreadsheets, number series, audio and video files, photographs and other images. The Research Data Store is complementary to the other EUDAT services that manage the large volumes of official community data. Technologies: Invenio, figshare, beehub and MyExperiment. More info: eudat-simplestore@postit.csc.fi 18

5/6 PIDS@EUDAT Objective: Deploy a robust, highly available and effective PID service that can be used within the communities and by EUDAT. Description: Keeping track of the names of data sets or other digital artefacts deposited with the CDI requires more robust mechanisms than noting down the filename. The PID service will be required by many other CDI services, from Data Movement to Search and Query. Technologies: Currently considering use of both EPIC for data objects, and DataCite to register DOIs (Digital Object Identifiers for published collections. More info: eudat-persistentidentifiers@postit.csc.fi 19

6/6 AAI@EUDAT Objective: Provide a solution for a working AAI system in a federated scenario. Description: Design the AA infrastructure to be used during the EUDAT project and beyond. Key tasks: Leveraging existing identification systems within communities and/or data centers Establishing a network of trust among the AA actors: Identifty Providers (IdPs), Service Providers (SPs), Attribute Authorities and Federations Attribute harmonization Technologies: Oauth2, OpenID, RADIUS, SAML2, X.509, XACML, etc. More info: eudat-aai@postit.csc.fi 20

How Requirements are Shared? Service SR DR MD SS PID AAI Community CLARIN X + X X + X ENES X X X + X EPOS X X X X VPH X X X X LifeWatch X + X + + X NB: X = this service is relevant to this community, + = this community has interest in this service but at a later stage or has a similar service already running in production. 21

Dynamic replication to HPC workspace for processing 22

INFRASTRUCTURE 23

OPERATION TEAM 24

How Do We Sustain This? Organisational Model How do we move for a project collaboration to a federated infrastructure? Which are the actors of this infrastructure and what is/are their role(s)? How do we integrate new members? How will the infrastructure interact with other infrastructures and projects? Costs and Funding Models Who will pay for the infrastructure and the shared services? What are the costs of the services? How to define a business model that best support the interest of research communities, data centers and funders? 26

1) Data is raw material. Why EUDAT 2) Integration and open data requires standardized services. 3) European data infrastructure is fragmented. 4) Storage service costs and service levels are difficult to compare. 5) Storage services costs scales less than linearly with storage capacity. Centralized storage services more cost efficient than small data repositories. 27

Welcome to the 1st EUDAT Conference! 22-24 October 2012, Barcelona International event with keynotes from Europe and US A forum to discuss the future of data infrastructures Project presentations and poster sessions Parallel session on Sustainability and Funding Models Training tutorials 28

eudat-info@postit.csc.fi Ari Lukkarinen ari.lukkarinen@csc.fi 29