Key Elements of Global Data Infrastructures

Similar documents
EUDAT Common data infrastructure

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT Towards a Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

1. General requirements

EUDAT - Open Data Services for Research

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Managing very large Multimedia Archives and their Integration into Federations

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data management and discovery

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Towards a joint service catalogue for e-infrastructure services

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Data Discovery - Introduction

Building metadata components

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Persistent and unique Identifiers

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Annotation by category - ELAN and ISO DCR

The EUDAT Collaborative Data Infrastructure

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

B2SAFE metadata management

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

From Persistent Identifiers to Digital Objects to Make Data Science More Efficient

Metadata and DCR. <CMD_Component /> Dieter Van Uytvanck. Max Planck Institute for Psycholinguistics

Detailed analysis + Integration plan

Indiana University Research Technology and the Research Data Alliance

DefendX Software Control-QFS for Isilon Installation Guide

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

I data set della ricerca ed il progetto EUDAT

Designing an institutional research data management infrastructure for the life sciences

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

On Trust! From an MPG and EUDAT Perspective! Raphael Ritz, RZG. Stockholm, June 4, 2014!

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

OpenAIRE Open Knowledge Infrastructure for Europe

NTP Software QFS for Isilon

EUDAT & SeaDataCloud

Using EUDAT services to replicate, store, share, and find cultural heritage data

Fundamentals of Data Infrastructures

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

RDA in a nutshell. Warsaw, 28 January Leif Laksonen/RDA Europe 3 CC BY-SA 4.0

DefendX Software Control-Audit for Hitachi Installation Guide

DARIAH-AAI. DASISH AAI Meeting. Nijmegen, March 9th,

D-SPIN Report R2.2b: The German Resource Landscape and a Portal

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Safe Replication and Data Staging

Greek e-infrastructures Short report

Third public workshop of the Amsterdam Group and CODECS C-ITS Deployment in Europe: Common Security and Certificate Policy

NTP Software Defendex (formerly known as NTP Software File Auditor)

TRUST IDENTITY. Trusted Relationships for Access Management: AND. The InCommon Model

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

DefendX Software Control-Audit

PID System for eresearch

Persistent Identifiers

Internet of secure things: issues and perspectives. Pasquale Pace Dimes - UNICAL

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Caring for research data and what about software? Peter Doorn, director DANS

DefendX Software Control-Audit

Towards a Federated SOA Model in Achieving Data Interoperability in DoD. Nick Duan, Ph.D. ManTech MBI AFCEA/GMU C4I Symposium May 20, 2008

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Dutch View on URN:NBN and Related PID Services

1- ASECAP participation

An overview of the OAIS and Representation Information

1 Executive Overview The Benefits and Objectives of BPDM

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

NTP Software File Auditor for Hitachi

1. Publishable Summary

Supporting IT Security Response Teams

David Minor UC San Diego Library Chronopolis Preservation Network

RDP203 - Enhanced Support for SAP NetWeaver BW Powered by SAP HANA and Mixed Scenarios. October 2013

DefendX Software Control-QFS for EMC Installation Guide

Deployment is underway!

3 Major Considerations to Bring Industrial IoT to Reality Péter Bóna, CEO, Com-Forth Kft. Budapest, 15th May 2018

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment

Science Europe Consultation on Research Data Management

Chapter 16. Layering a computing infrastructure

NTP Software VFM Recovery Portal

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

NTP Software File Auditor for Windows Edition

Common Language Resources and Technology Infrastructure REVISED WEBSITE

RESOLUTION 47 (Rev. Buenos Aires, 2017)

Digital Library Interoperability. Europeana

Workshop 2. > Interoperability <

Global Data Sharing The Research Data Alliance

AARC Overview. Licia Florio, David Groep. 21 Jan presented by David Groep, Nikhef.

WORKSHOP 28 TH /29 TH APRIL Christine Staiger

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

NTP Software Defendex (formerly known as NTP Software File Auditor) for NetApp

Transcription:

Key Elements of Global Data Infrastructures Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands

Big Questions in Complex Data Domain How to guarantee easy and persistent access to globally available data objects and collections? How to remove interoperability barriers to easily analyse large distributed data sets? many aspects and answers - focus on three different key elements for Data Infrastructures (DI)

Remind the Long Data Tail Volume Focus on Big Data - in general raw data - in general regular structure Number of Data Sets Data Intensive Science domain of numbers - find patterns across globally spread collections - fit parameters using big data Focus also on Small Data - often covers domain knowledge - much more heterogeneous - in general special structures and difficult semantics Smart Information Science domain of symbols just one example - semantically join kindred collections - exploit using semantic knowledge

Data Infrastructures need Registries modern societies have (cadastral) land registries - dimensions, owner, claims, etc. - also roads, electricity lines, etc. hierarchy of authorities ALL know how to find/read them functioning DI requires agreed registries of many types centers/repositories, objectids, personids, etc. need agreements on formats, content, APIs to support automatic access need global agreements - need a big approach does not make sense to support small islands how can we make registry approach scale and who takes care about persistency we need to bundle forces => DAITF

Interoperability is essential, but... Interoperability is relevant at many levels integration of metadata tools to automatically find information execute one operation on data sets created by different researchers it s all about interpreting syntax and semantics and bridging interoperable DI requires adhering to basic IT principles make your syntax/formats explicit and register them in known registries make your semantics (elements, vocabularies) explicit and register them use persistent identifiers for ALL references does not make sense to support small islands but need to accept different models we need to bundle forces => DAITF

Smooth Distributed Authentication a Must are currently in the dark middle ages have many identities IdPs are small kingdoms no trust in/for academia inefficient practices come into place efficient DI requires smooth distributed authentication need simple mechanisms supporting single identity and single sign-on need trust in academia to exchange attributes (Code of Conduct is promising) must be worldwide since data is worldwide can t be true that all set up own user databases we need to bundle forces => DAITF

Bundle forces => DAITF Data Access and Interoperability Task Force sounds like IETF which is not per accident most agree it needs to be a grass-roots based process how to organize global interaction and harmonization process it is embedded in many existing activities do we need something separate - will we join? EC is ready to fund this activity actually EUDAT and OpenAIRE reserved some funds already icordi is committed to push this ahead in coming years US seems to be ready to support this activity various other countries sent people to a pre ICRI workshop improving integration and interoperability - establishing DI - will be a stepwise process we are already working on that path (policy-rule based replication across Atlantic)

Policy-Rule based Replication across Atlantic EPIC PIDs MPI RENCI What s worth mentioning? all data/metadata organizations and attributes are maintained explicit PIDs are used to check validity users could immediately start working with copies ready for auditing