Designing an institutional research data management infrastructure for the life sciences

Similar documents
NEUROIMAGING RESEARCH DATA LIFE- CYCLE MANAGEMENT

ISA 767, Secure Electronic Commerce Xinwen Zhang, George Mason University

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Building a Dutch National Research Infrastructure IRODS UGM 2017

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

The Smithsonian/NASA Astrophysics Data System

DSpace Fedora. Eprints Greenstone. Handle System

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Federated XDMoD Requirements

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Developing a Research Data Policy

Academic Workflow for Research Repositories Using irods and Object Storage

Integration of the platform. Technical specifications

HDP Security Overview

HDP Security Overview

The Modern Web Access Management Platform from on-premises to the Cloud

Storage Made Easy. Providing an Enterprise File Fabric for INVESTOR NEWSLETTER ISSUE N 3

Hospital System Lowers IT Costs After Epic Migration Flatirons Digital Innovations, Inc. All rights reserved.

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Architecture Assessment Case Study. Single Sign on Approach Document PROBLEM: Technology for a Changing World

A national approach for storage scale-out scenarios based on irods

Integrating Identity Management Aspirations and Issues

D9.2.2 AD FS via SAML2

Canadian Access Federation: Trust Assertion Document (TAD)

Datameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics

Canadian Access Federation: Trust Assertion Document (TAD)

BEYOND AUTHENTICATION IDENTITY AND ACCESS MANAGEMENT FOR THE MODERN ENTERPRISE

Storage Made Easy. SoftLayer

Key Elements of Global Data Infrastructures

Striving for efficiency

Udemy for Business SSO. Single Sign-On (SSO) capability for the UFB portal

The Potential for Blockchain to Transform Electronic Health Records ARTICLE TECHNOLOGY. by John D. Halamka, MD, Andrew Lippman and Ariel Ekblaw

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

IBM Secure Proxy. Advanced edge security for your multienterprise. Secure your network at the edge. Highlights

Efficiently Sharing All of a Patient s Data With All their Providers Completing the Model

warwick.ac.uk/lib-publications

Canadian Access Federation: Trust Assertion Document (TAD)

EUDAT - Open Data Services for Research

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

RSA SecurID Access SAML Configuration for StatusPage

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Frequently Asked Questions. My life. My healthcare. MyChart.

Authentication in Galaxy : let's use what is out there - (National) Identity Providers

Using EUDAT services to replicate, store, share, and find cultural heritage data

Storage Made Easy. Mirantis

DARIAH Update. 9th FIM4R Workshop. Vienna, Novemer 30, Peter Gietz, DAASI International GmbH.

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

Canadian Access Federation: Trust Assertion Document (TAD)

VAM. ADFS 2FA Value-Added Module (VAM) Deployment Guide

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

ForgeRock Access Management Core Concepts AM-400 Course Description. Revision B

Composable Software, Collaborative Development, and the CareWeb Framework. Doug Martin, MD

The EUDAT Collaborative Data Infrastructure

Canadian Access Federation: Trust Assertion Document (TAD)

Connect Authenticate

Security and Privacy Governance Program Guidelines

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

WebEx Connector. Version 2.0. User Guide

All about SAML End-to-end Tableau and OKTA integration

Article begins on next page

Fair data and open data: differences and consequences

Attributes for Apps How mobile Apps can use SAML Authentication and Attributes

Office 365 and Azure Active Directory Identities In-depth

PRODUCT DESCRIPTIONS AND METRICS

Implementing a Genomic Data Management System using irods at Bayer HealthCare

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

<Insert Picture Here> Oracle s Medical Imaging Technology Oracle Multimedia DICOM: Next Generation Platform

Canadian Access Federation: Trust Assertion Document (TAD)

User Directories. Overview, Pros and Cons

I data set della ricerca ed il progetto EUDAT

Mozy. Administrator Guide

Data Management: the What, When and How

open source community experience distilled

From Open Data to Data- Intensive Science through CERIF

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Simplifying Collaboration in the Cloud

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

TECHNICAL GUIDE SSO SAML Azure AD

Canadian Access Federation: Trust Assertion Document (TAD)

About This Document 3. Overview 3. System Requirements 3. Installation & Setup 4

An Institutional Approach to Developing Research Data Management Infrastructure

Canadian Access Federation: Trust Assertion Document (TAD)

Canadian Access Federation: Trust Assertion Document (TAD)

Data management and discovery

Grid Platform for Medical Federated Queries Supporting Semantic and Visual Annotations

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Fedora, Islandora, & Samvera: Requirements and Gaps. David Wilcox,

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Implementing Trusted Digital Repositories

Identity and capability management and federation

The AAF - Supporting Greener Collaboration

The Future of Interoperability: Emerging NoSQLs Save Time, Increase Efficiency, Optimize Business Processes, and Maximize Database Value

Canadian Access Federation: Trust Assertion Document (TAD)

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

Transcription:

Designing an institutional research data management infrastructure for the life sciences Paul van Schayck PhD student, data steward Maastricht University Medical Center + p.vanschayck@maastrichtuniversity.nl https://datahub.mumc.maastrichtuniversity.nl Peter Debyelaan 15, 6229 HX Maastricht P.O. Box 616, 6200 MD Maastricht The Netherlands 1

providing Research Data Management services for Life Sciences Faculty Academic Hospital Independent research groups Heterogeneous (meta)data Right incentives Patient privacy Electronic Health Records Bridging organisations 2

Life science background Life science depends more and more on the collection and analysis of comprehensive datasets. Small Science. Life science is performed in small temporary project groups. Open Science. There is an urgent call for more open, transparent and reproducible science. 3 3

DataHub characteristics FAIR-inspired from start. Open-source where possible. (Meta)data structuring + ontology enrichment. Project data structuring; Hierarchical organisation in projects and datasets. Faceted search, Lucene & ontology-powered, authorization controlled High volume; The infrastructure has been designed and tested with petabyte scale and high throughput in mind. 4

Healthcare Research data governance Scientist EHR Ontology Service CrossRef epic Persistent Identifier s Master Person Index ELK HL7CDA export DataHub core services Frontend HL7v3 XML Extract Transform Load Data Warehouse API Facetted search UI Life science Web portal Transform Data drop zone metadata XML Hitachi NAS storage Rule based object store Replication storage REST API Browser External Repository WebDAV DataHub 2.0.0 5

Authentication (federated) Web Portal IdP login SAML based SSO irods-php via proxy user Generate irods temporary password Browser WebDAV user@domain.tld#nlmumc temporary password Providing federated authentication in two methods: proxy-user and temporary password Outstanding issue: Automated handling of user provisioning/expiration 6

Ingesting high volume data SMB/CIFS Network Share msiphypathreg irods Mounted Collection irods Collection data cifs-utils msicollrsync Web Portal irods-php SMB/CIFS network share connected as irods mounted collection is ingested into irods using msicollrsync Advantage: No extra (client) software for users SMB/CIFS performs very well Disadvantage: Not compatible with federated authentication msicollrsync not performing (yet) 7

Project collection structure /nlmumc/projects/ P000000001 C000000001 Dataset: any number of files and directories P000000002 C000000002 P000000003 C000000003 Providing a generic project collection hierarchy with no assumptions Unidentifiable collection names Virtual collections? Title AVUs on Project and Collections 8

Project authorization Project: P000000001 contributor Can create Project collection: C000000001 own manager Can assign inherited Open phase write contributor read only Closed phase read reader Keeping data authorization in irods using the rule engine to enforce policies Disadvantages: Only on project level Too simplistic? Note: irods groups are organizational units (departments) 9

Metadata modeling: being FAIR Ontology Service CrossRef epic Persistent Identifiers Islandora XML forms metadata.xml ETL C000000001 Validation Helping users early with annotating data FAIR Project -> Investigation -> Sample -> Assay (PISA) Inspired by ISA tools, compatible with HCLS Implemented Project and Investigation level Descriptive metadata stored in file (!), AVUs for system metadata 10

Metadata indexing metadata. xml Ontology Service CrossRef DWH frontend API REST API Transform Data Warehouse Providing a user friendly facetted search interface for data findability Indexed in SOLR: All metadata Semantics (OLS) References (CrossRef) Authorization on data (irods) Rebuild on demand 11

Metadata: making use of semantics Autocomplete for ontology terms Ontology derived facetted search 12

DTAP: deployment for development External Web portal WebDAV REST API DWH frontend CrossRef Browser Ontology Service Challenge Interactions with external services (AD, NAS storage) Highlights 16 interacting containers for full environment Runnable from laptop 13

DTAP: deployment for acceptation/production External WebDAV REST API DWH frontend CrossRef Browser Ontology Service Web portal Challenge Differences in deployments and some environments 14

Todays challenge in the data life cycle Active data Preserved data Phases Create Process Analyse Highly specific RDM solutions BRIDGE THE GAP! Phases: Archive Access Re-use Generic repositories Domain specific repositories 15

Lessons learned 1. Dual position of staff. Decentralize data stewards 2. Micro Service approach 3. Remote Procedure Calls for rules 4. Funding for long term storage is hard 5. Open Source re-useable parts https://github.com/maastrichtuniversity 16

Questions? Pascal Suppers Managing Director DataHub Maastricht P. Debyelaan 15 L. van Kleeftoren 6229 HX Maastricht 2nd floor (route 11) The Netherlands T +31 6 27 07 16 54 E p.suppers@maastrichtuniversity.nl Paul van Schayck PhD student, data steward Maastricht University Medical Center + p.vanschayck@maastrichtuniversity.nl https://datahub.mumc.maastrichtuniversity.nl Peter Debyelaan 15, 6229 HX Maastricht P.O. Box 616, 6200 MD Maastricht The Netherlands 17