EUDAT Common data infrastructure

Similar documents
EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Towards a Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Key Elements of Global Data Infrastructures

I data set della ricerca ed il progetto EUDAT

EUDAT- Towards a Global Collaborative Data Infrastructure

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

Indiana University Research Technology and the Research Data Alliance

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Data Discovery - Introduction

EUDAT - Open Data Services for Research

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

The EUDAT Collaborative Data Infrastructure

Global Data Sharing The Research Data Alliance

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Safe Replication and Data Staging

Data Staging: Moving large amounts of data around, and moving it close to compute resources

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT & SeaDataCloud

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Community and Impact Building the Research Data Alliance

Globally Interoperable IoT Identification and Data Processing

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Managing very large Multimedia Archives and their Integration into Federations

European digital repository certification: the way forward

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The Research Data Alliance Creating the culture and technology for an international data infrastructure

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Open Science Commons: A Participatory Model for the Open Science Cloud

EGI federated e-infrastructure, a building block for the Open Science Commons

Progress towards the EOSC

B2SAFE metadata management

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Caring for research data and what about software? Peter Doorn, director DANS

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

The Research Data Alliance (RDA): building global data connections CC BY-SA 4.0

Persistent and unique Identifiers

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Striving for efficiency

EUDAT and Cloud Services

Europeana Core Service Platform

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

«Prompting an EOSC in Practice»

RDA in a nutshell. Warsaw, 28 January Leif Laksonen/RDA Europe 3 CC BY-SA 4.0

Webinar Annotate data in the EUDAT CDI

Fundamentals of Data Infrastructures

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

The Materials Data Facility

Conferenza GARR Moving Forward to deliver an «EOSC in Practice» by 2018

irods workflows for the data management in the EUDAT pan-european infrastructure

Research Data Alliance Current Status

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Sharing research data globally and supporting re-use - Playing your part Leif Laaksonen /National Archive of Finland

Towards a joint service catalogue for e-infrastructure services

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Improving a Trustworthy Data Repository with ISO 16363

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

Developing a social science data platform. Ron Dekker Director CESSDA

From Open Data to Data- Intensive Science through CERIF

The RDA Structure and Governance

Persistent Identifiers

BPMN Processes for machine-actionable DMPs

Overview of the European Open Science Cloud. Jan Wiebelitz e-irg support programme

HPC IN EUROPE. Organisation of public HPC resources

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Digital repositories as research infrastructure: a UK perspective

Data is the new Oil (Ann Winblad)

From Persistent Identifiers to Digital Objects to Make Data Science More Efficient

Europeana DSI 2 Access to Digital Resources of European Heritage

Scientific Data Curation and the Grid

GEOSS Data Management Principles: Importance and Implementation

DCH-RP Trust-Building Report

Research Data Repository Interoperability Primer

epic and the Handle System

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Ways to improve Global Data Sharing and Re-use

Trusted Digital Archives

Supporting European e-infrastructure service providers to join the European Open Science Cloud

Transcription:

EUDAT Common data infrastructure Giuseppe Fiameni SuperComputing Applications and Innovation CINECA Italy Peter Wittenburg Max Planck Institute for Psycholinguistics Nijmegen, Netherlands

some major characteristics regular big data - easy to manage (but real-time streams) - lots of automatic processing - high reduction as goal irregular big data - automatically derived data - crowd sourcing changes the rules long tail data all the same for industry, - difficult to manage government, public services, - lots of relations citizens, etc. 3

big scientific data > the data fabric great changes referable data citable data citable publications 4

complexity is relevant filenames/directories are not sufficient anymore to memorize - even our experimentalists (brain images etc) start believing it lots of relationships (organization, content, provenance, etc.) to be stored many work on special aggregations (need to be named & stored) currently too much time lost with management integrated datastore - directories - files clearly see a split of functions dedicated organization - metadata/provenance - PIDs - relations dedicated big datastore - simple API - fast access 5

EUDAT s mission: common services in CDI need experts close to the CLARIN, LifeWatch, ENES, communities knowing the EPOS, VPH, INFC etc. methods, traditions and 6 Core Infrastructures cultures about 20 infrastructures (research infrastructures) 12 EUDAT data centers and/or crossdisciplinary initiatives diagram taken from EC s HLEG report Riding the Wave 6

common services EUDAT is working on Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Safe Replication Data curation and access optimization Various flavors Data Staging Dynamic replication to HPC workspace for processing Simple Store Researcher data store (simple upload, share and access) AAI Network of trust among authentication and authorization actors PID Identity Integrity Authenticity Locations services to come EUDAT Box dropbox-like service easy sharing local synching Dynamic Data immediate handling Semantic Anno checking & referencing what next? 8

Safe Replication Service Robust, safe and highly available data replication service for small- and medium- sized repositories To guard against data loss in long-term archiving and preservation To optimize access for user from different regions To bring data closer to powerful computers for compute-intensive analysis PIDs Policy rules EUDAT CDI Domain of registered data http://eudat.eu/safe-replication D-Day ICTP eudat-safereplication@postit.csc.fi September 5th 2013 10

replicate my collection X to three data centres EPCC EUDAT center Community center VPH ENES CLARIN Lifewatch CINECA BSC EPOS 12

Data Staging Service Support researchers in transferring large data collections from EUDAT storage to HPC facilities Reliable, efficient, and easy-to-use tools to manage data transfers Provide the means to reingest computational results back into the EUDAT infrastructure not a simple service! politics involved (access to HPC) EUDAT CDI Domain of registered data PRACE HPC HPC http://eudat.eu/datastaging D-Day ICTP eudat-datastaging@postit.csc.fi September 5th 2013 14

Simple Store Service Allow registered users to upload long tail data into the EUDAT store Enable sharing objects and collections with other researchers Utilise other EUDAT services to provide reliability Simple upload Simple metadata much competition PID registration EUDAT CDI Domain of registered data see it as complementary finally it is about trust http://eudat.eu/simplestore eudat-simplestore@postit.csc.fi 15

EUDAT Box Service some similarity to SimpleStore of course just similar to Dropbox incl. load balancing and replication there is no metadata just data how to integrate into registered domain of data? PID registration synchronization much competition EUDAT CDI Domain of registered data see it as complementary finally it is about trust 16

Metadata Service Easily find collections of scientific data generated either by various communities or via EUDAT services Access those data collections through the given references in the metadata to the relevant data stores Europeana of scientific data how to offer metadata in a cross-disciplinary space? scalability issue? EUDAT CDI Domain of registered data http://eudat.eu/metadata D-Day eudat-metadata@postit.csc.fi ICTP September 5th 2013 17

Semantic Annotation Service acts as a plugin component to be executed before uploading a resource with tags (crowd sourcing etc.) check tags against Knowledge Source & correct/refer/etc. could be used as trigger in Simple Store plugin available to everyone not center dependent check&annotate PID registration EUDAT CDI Domain of registered data 18

service targeting Replication: Data Staging: SimpleStore: EUBox: Metadata: SemAnn: targeted at data managers/archivists/ projects/departments without facilities same plus easy access to HPC place for individuals/projects/groups to store & exchange data share data via synchronization EUDAT data & everyone interested individual/projects working with massive amounts of human created data data stored in domain of registered data is not EUDAT s data! how to make this visible? in SiSt community branding etc. 19

EPOS service implementation Persistent Identifiers registration Daily back-up Near real time sync (ongoing) EPOS Workshop, Erice - Italy - August 2013 22 22

is there a global challenge? EUDAT interfaces with many different data providers as do comparable initiatives such as DataONE, etc. currently little is compatible at various layers infrastructure layer: no agreed components, no agreed APIs content layer: formats, semantics (concept registration & bridging) logical layer: PID + attributes, metadata principles + attributes, concept/schema registration, policies something to be done to be accelarated? 23

who is working on it? different initiatives working on a variety of aspects (just a few) ESFRI initiatives working on discipline interoperability and improving/harmonizing data landscapes need harmonization EUDAT working on common data services need harmonization OpenAIRE working on specific data service need harmonization Europeana working on aggregating metadata need harmonization etc. a variety of standardization and policy organizations standards: ISO, IETC, IETF, W3C, OAI, OASIS, DONA, etc. hl policies: CODATA, WDS, etc. some thought: we need a fast acting, bottom-up initiative focusing on removing barriers for sharing data Research Data Alliance 24

share canonical access procedure taken from Larry Lannom need agreed ways to store and manipulate ext/int properties need agreed ways to do reference resolution (URIs vs. PIDs) need agreed ways to build common components or to rely 25 on principles

learning from Internet Analysis Apps Persistent Reference Custom Clients Resolution System Citation Plug-Ins Typing Value Added Services points to instances describes properties bit sequence (instance) PID Persistent Identifiers PID record attributes describes properties & context Digital Objects Data Sets RDBMS Files Local Storage Cloud Computed Data Sources point to each other metadata attributes let s come to a common object model with PIDs as anchors like IP numbers in networks PID and MD records store properties of objects and collections, policy rules manipulate properties EUDAT is a domain of registered data objects 26

work in RDA Data Foundation and Terminology PID Information Type Harmonization Data Type Registry UPC for Data Practical Policy Metadata Normalization Contextual Metadata Pub/Data Citation/Linking Scientists Engagement Community Capability Model Preservation Infrastructure Legal Interoperability Repository Audit and Certification Marine Data Harmonization Defining Urban Data Exchange for Science 2. RDA Plenary, 16-18 September 2013, Washington, US 3. RDA Plenary, 26-28 March 2014, Dublin, AU/Europe 4. RDA Plenary,? October 2014,?, Europe (bid is open) 5. RDA Plenary,? March D-Day 2015, ICTP September?, US 5th 2013 (bid is open) 27

example: PID Information Types worldwide PID system DONA Service Providers (Handles, DOIs, etc.) other Service Providers (AWKs, URNs, etc.) not scalable registration & resolution all APIs different how to get a cksm? repository service repository service service repository chairs: Tobias Weigelt (DKRZ), Timothy Dilauro (JHU) 28

example: PID Information Types worldwide PID system DONA Service Providers (Handles, DOIs, etc.) other Service Providers (AWKs, URNs, etc.) scalable registration & resolution one API cksm is typed, fragment pointer is chopped, etc. repository service repository service service repository 29

EUDAT/RDA lessons learned? some RDA lessons too early really but domain of registered data and data fabric will be essential some enthusiastic people but little time left for RDA work much top-down activity (EC, NSF, AU ministry, etc.) many new group initiatives will they survive? give one more year and we will see to me it is THE chance to make progress, depends on all of us http://rd-alliance.org 32

Thanks for the attention. Questions? 33