Data management and discovery using EUDAT services Hans van Piggelen SURFsara, The Netherlands
EUDAT Mission offer common data services in CDI to all European researchers services will address the needs of big data volumes as well as of long tail of data respect the communities choices of data organizations achieve harmonization and efficiency in the long term Life Science Workshop, November 13th 2014
Data discovery How to make sure your published data set can be discovered by anyone, anytime? How to store it online? How to make it findable? How to make it uniquely identifiable? How to make the most out of your data? How to make it available, even after 20 years? How to find other data sets? Data repositories and search services! 3
Data management and policies Process of controlling information generated during a (research) project Availability, authenticity, discoverability, curation Preferably: Manage data sets online with ease Set policies, e.g.: Data access per user or group Automatic data replication on file level Share your data sets (publicly) Integration with other services 4
Data repositories Easily share data with collaborators and other researchers Uniquely identify data sets using persistent identifiers Add metadata to improve quality and discoverability Calculate checksums for data objects For small data up to large data sets Curate data for long-term storage 5
Metadata templates Standardised metadata schemes for improved discoverability Defined by research community or organisation In addition to obligatory minimal metadata Generally searchable in any connected service 6
Persistent Identifiers (PIDs) Unique identifier for every uploaded data set Ensures long-term: Authenticity Traceability Discoverability Integrity Persistent: identifier will never change But also: referred data will never change EUDAT: EPIC PID service 7
B2FIND: Metadata Search Service Easily find collections of scientific data generated either by various communities or via EUDAT services Access those data collections through the given references in the metadata to the relevant data stores Europeana of scientific data EUDAT CDI Domain of registered data 8
B2SHARE: Data sharing service Online data repository Web interface Easy deposit and sharing of data sets Public metadata and metadata schemes Multiple sharing levels Embargos EUDAT CDI Domain of registered data 9
B2SHARE portal B2FIND portal Simple upload Add metadata PID registration Metadata Metadata Metadata EUDAT CDI Domain of registered data 10
B2SAFE: Safe Replication Service Robust, safe and highly available data replication service for small- and medium- sized repositories Guard against data loss in long-term archiving and preservation Optimize access for user from different regions Bring data closer to powerful computers for compute-intensive analysis 11
B2STAGE: Data Staging Service Supports researchers in transferring large data collections from EUDAT storage to HPC facilities Reliable, efficient, and easy-to-use tools to manage data transfers Provide the means to re-ingest computational results back into the EUDAT infrastructure EUDAT CDI Domain of registered data PRACE HPC HPC 12
B2DROP: Sync & Exchange Service Allow registered users to upload long tail data Enable sharing objects and collections with other researchers Utilize other EUDAT services to provide reliability and data retention 13
EUDAT services overview 14
Research Timeline Before During AKer B2FIND Data Ingest Service B2DROP / SURFdrive BEEHUB B2STAGE GRID SE Central Archive Research B2SAFE B2SHARE EPIC PID Research Data Storage Trusted Digital Repository 15
EUDAT services overview B2FIND Aggregated EUDAT metadata domain Data inventory B2SAFE Data curaqon and access opqmizaqon B2STAGE Dynamic replicaqon to HPC workspace for processing B2SHARE Researcher data store (simple upload, share and access) AAI Network of trust among authenqcaqon and authorizaqon actors PID IdenQty Integrity AuthenQcity LocaQons B2DROP Easy sharing Local syncing 16