Deliverable 8.2. Project ID Project Title. Project Acronym. Start Date of the Project. Duration of the Project. Work Package Number 8

Size: px
Start display at page:

Download "Deliverable 8.2. Project ID Project Title. Project Acronym. Start Date of the Project. Duration of the Project. Work Package Number 8"

Transcription

1 Deliverable 8.2 Project ID Project Title Project Acronym Start Date of the Project Duration of the Project A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data PhenoMeNal 1st September Months Work Package Number 8 Work Package Title Deliverable Title Delivery Date Work Package leader Contributing Partners Authors Data provenance, compliance, and integrity D8.2 Modularised ISA model and format:biospecimen centric schema, corresponding xml schemas, reference implementation guidelines and validation rules M24 UOXF UOXF, EMBL-EBI,ICL. Philippe Rocca-Serra, Susanna-Assunta Sansone, Reza Salek, Kenneth Haug, Namrata Kale, Jake Pearce, Noureddin Saddawi, David Johnson, Alejandra Gonzalez- Beltran. Abstract: ISA representation is the data structure used by EMBL-EBI MetaboLights repository for metabolomic studies metadata. The format is also adopted by data-focussed publishers to handle datasets, such as Oxford University Press GigaScience and Springer Nature s Scientific Data). The initial format specifies rather informally the underlying model and how the syntactic elements are related to each other. The work presented here summarized how a set of JSON schemata, support JSON-LD context file for full semantic representations, as well as a clinical data set-orientated ISA configuration has been developed to produce a machine-readable serialization of the ISA model. Furthermore, this deliverable presents the latest developments of the ISA-API implementating the set of coding recommendations adopted by PhenoMeNal. EXECUTIVE SUMMARY 3 1

2 DETAILED REPORT OF THE DELIVERABLE 4 1. Creation of a machine-readable ISA model Background: Implementation: Normative documentations in readthedoc format Reference implementation: the ISA-API 6 2. Declaration of study design related information Background Coding Patterns and Recommendations Implementation 8 3. Declaration of Ethics and Legal Information Background Coding Patterns and Recommendations Implementation Declaration of Quality Control Elements Background Coding Patterns and Recommendation and Implementation Declaration of instrument vendor format and preprocessed data Background Coding Patterns, Recommendations and Implementation 16 WORK PLAN 16 DELIVERY AND SCHEDULE 17 CONCLUSION 17 2

3 1 EXECUTIVE SUMMARY The H2020 PhenoMeNal e-infrastructure project aims to deliver a scalable, robust and standards-compliant infrastructure for clinical phenotyping by means of metabolomics techniques. The main goal of the deliverable is to provide a formal, machine-readable specification of the ISA model, aiming at delivering a more prescriptive representation of experimental study metadata than those currently available from the initial ISA-Tab normative documents released in DETAILED REPORT OF THE DELIVERABLE 3

4 Creation of a machine-readable ISA model Background: The ISA specifications initially released in did not provide machine readable, formal representations, thus making it difficult for developers and implementers to built compliant tools owing to the potential risks associated with interpreting a textual description of a syntax specifications. The goal of the deliverable was to eliminate those shortcomings by producing a machine-readable serialization of the ISA model, to establish the foundation for a robust metadata management tracking and quality assessment for the PhenoMeNal project. Implementation: JSON Schema representation of ISA model WP8 has delivered an exhaustive and normative representation of the ISA syntax relying on JSON schema technology. A set of 21 JSON schemata have been produced, representing each of the core objects underlying the ISA tabular syntax (see figure 1); the work is available from the ISA github repository

5 Figure 1. An overview of the modular, JSON schema-based formal representation of the ISA model. Normative documentations in readthedoc format To further support the JSON schema representation, a full set of documentation has been released in the form of readthedoc microsite 3 (see Figure 2), 3 5

6 Figure 2. A screenshot showing the dedicated microsite based on the readthedoc approach. The site provides an up to date online resource for guiding users through the ISA specifications and serves information about the different types of serializations (tabular or JSON). Reference implementation: the ISA-API Built on top of the JSON schema definition, the ISA-API provides a set of tools to manipulate ISA objects, parse ISA documents in Tabular or JSON formats, build an object 6

7 representation and a graph data structure, which allows fast traversal of information, as well as validation against the syntax. Additional components have been added to further validate information supplied in ISA syntax. These additional validation steps are required in particular by conversion modules to third party formats. These are formats used by public repositories for non-metabolomics related omics data types such as transcriptomics and genomics. The ISA- API converters allow data processing for repositories such as EMBL-EBI ArrayExpress or EMBL-EBI Short Read Archive. Both repositories have specific annotation needs, which need to be dealt with. The ISA-API is available from the GitHub repository 4, Figure 3. A screenshot of the ISA-API GitHub repository, which provides documentation and assistance to developers that wish to use or contribute to the work. Declaration of study design related information 4 7

8 Background A systematic analysis of EMBL-EBI Metabolights has been performed as means to test the validity and efficiency of PhenoMenal workflows. The results indicate that nearly 50% of ISA archives served by the European repository contain a syntactic or structural error. A finer review shows that 25% fail a basic syntactic validation invoking the relevant function from ISA-API. Another 24% reveals errors, when tested for semantic content using additional function from the ISA-API. The errors belong to 2 distinct classes but both affect automatic handling of the document by analysis workflows. The first type of error corresponds to the declaration of spurious ISA Experimental Factors, which are meant to represent independent variables as declared by experimentalists. They are therefore meant to encompass a range of discrete values. A simple inspection of the ranges exposes the errors. In a number of cases, submitters and curators confuse independent variables and covariates (e.g. typically experiments declaring more than 6 factors should attract suspicion: e.g. MTBLS124 or MTBLS93 with combinations). The second type of error is structural and corresponds to a failure to properly represent the underlying relationships between subject-derived material and data acquisition events. This type of error is harder to detect as the ISA-Tab documents are syntactically valid however the information representation is erroneous. This leads to failure to properly determine sample size and therefore computation cannot be automated or if it passes the checks would produce erroneous results. Coding Patterns and Recommendations As pointed out already, the possibilities offered by PhenoMeNal workflows in terms of batch processing public datasets highlighted problems in annotation consistency. The ISA-API function to pull datasets from NIH Metabolomics Workbench, the US counterpart for EMBL-EBI MetaboLights also revealed metadata elements absent from MetaboLights metadata. To address these issues and converge towards a common set of descriptors, the PhenoMeNal ISA configuration now provided several new fields in the Study Design Descriptor Section of ISA document to report key summary information, which can help data discovery. These are summarized below. Augmented annotation for study designs Study Design Ontology Terms: {full factorial design, fractional factorial design} Comment[number of factor level] Comment[study subject count] Comment[number of treatment groups] Implementation 8

9 The ISA-API now implements several curation functions to detect and, to some extent, correct errors of the type mentioned above. At least, the newly developed functions provide curators additional information to direct their actions. The functions deliver simple yet effective means to significantly increase the quality of ISA archive documents generated by submission tools and pipeline. The ISA-API is now augmented with a creation mode, which can be used to bootstrap the creation of ISA documents by using study design information from users. The main feature of the code behind the functionality is the reliance of patterns, specific to a number of experimental designs (e.g. factorial design, balanced design, repeated measure designs), which can be applied irrespective of the type of data acquisition used. While the initial function has been developed to primarily support the reporting the MS and NMR based studies, we demonstrated the portability of the approach by implementing support for DNA microarray and next generation sequencing based signal acquisition. The code is available from the following github repository 5. To further demonstrate the benefits of the component, a series of jupyter notebooks have been built and can be used as basis for tutorial and training. A youtube video is being prepared and will be released as an update to this deliverable. The code is available from the following github repository 6 (see also Figure 4). Figure 4: A screenshot of the ISA-tools github repository showing six ipython notebooks devised to showcase the ISA-API capability to support MS and NMR based metabolomics studies but also applications of molecular biology technique for transcriptomics and genomics studies

10 As indicated above, the ISA-API can support multi-omics datasets owing to its native support for an array of molecular biology technique, a benefit from the modular approach allowed by the new ISA JSON schemata and the reliance on ISA configurations (see figure 5). Figure 5: A detailed view of the ipython notebook showing mobilisation of ISA-API create mode to rapidly generate an ISA archive document based on study design information prompted by the toolkit from users. This example shows how this is a applied to the specific case of NMR based data acquisition. 10

11 Declaration of Ethics and Legal Information Background In order to comply with EU regulations on data protection, privacy and ethics, metadata descriptors covering terms of use, consent availability and additional ancillary information has to be provided as part of study archives. Such information, when present in the ISA structured Metabolomics study metadata, could be used in downstream workflows to check whether requesters have the relevant privileges against the dataset s actual terms of use. Having such information embedded in ISA documents would greatly enhance the possibilities of devising a set of safety checks in additions to those already in place. Coding Patterns and Recommendations ISA documents can be annotated with data use information, implementing a series of ISA Comment fields holding values selected from the DUO, the data use ontology 7,8 values. The practice is in agreement with efforts carried out in the context of the Global Alliance for Genomic Health (GA4GH). 9 An ISA configuration file for clinical context has been amended and posted to Github: It is now the pattern to follow to report: - terms of use - ethical committee name - ethical committee project identification - url to data access committee - information about patient consent availability Implementation ISA configurations relevant to human patient based studies have been updated accordingly. The code is available from the GitHub repository "The Data Use Ontology - EMBL-EBI." 20 Feb. 2017, Accessed 31 Aug "The Data Use Ontology - The OBO Foundry." Accessed 31 Aug "Global Alliance for Genomics and Health." Accessed 31 Aug

12 Figure 6. panel a: an overview of the PhenoMeNal specific ISA configuration, documenting the implementation guidelines and patterns. 12

13 Figure 6. Panel b: The investigation.xml. 13

14 Figure 6. panel c: studysample.xml hold the ELSI related annotation elements in the form the ISA Comment fields configured to hold Data Use Ontology (DUO) terms (shown highlighted in pale yellow). Declaration of Quality Control Elements Background 14

15 It has been pointed out that metabolomics signal can only be properly analyzed if all the contextual information surrounding data acquisition events is provided. In particular, the reporting of all controls injected alongside test samples in mass spectrometry applications constitutes an essential quality insurance element. Up until now no requirements were made on users to deposit such data by public repositories. There was therefore no guideline for submission. The work reported in this deliverable closes this gap. Coding Patterns and Recommendation and Implementation Introduction the Terminology for reporting QC as established by the working group 8 and reported earlier. The terminology defined then is now available for us in the form of a new version of the ISA configuration for handling patient based datasets. The configuration is available from the ISA-tools github repository. More specifically, the element to consider is the following i. header="material Type", required=true <list-values>specimen,long-term reference,external long-term reference,study reference,dilution series reference,standard reference material,normal blank,reagent blank,sample preparation blank,batch terminus,negative control reference (blank),positive control reference (standard)</list-values> In addition, the ISA-API create mode is being augmented with a function allowing the reporting of all control elements following the definition and declaration of the experimental plan. This feature has been developed based on a similar function present in Mastr-MS and following discussions with Dr Saravanan Dayalan (Australia, Queensland university). NOTE: This work is not complete at time of the deliverable but the implementation is ongoing. Declaration of instrument vendor format and preprocessed data Background Instrument generated data most often come in vendor specific formats, some of which are organized in a directory structure. Such is the case for Bruker NMR data. While open standard formats exist to produce a vendor neutral equivalent information file (.e.g mzml in mass spectrometry context, nmrml in nuclear magnetic resonance context), there is a demand for keeping native instrument output. This request by users and metabolomics practitioners highlighted the need for coding guidelines for representing both types of raw information. In ISA format, a specific syntactic element (Raw Data File) exists but it is not repeatable. In other words, only one occurrence of the field element may appear in any one ISA assay 15

16 table. Therefore users need guidance to represent vendor native data as well as vendor neutral corresponding information. Coding Patterns, Recommendations and Implementation Reporting Vendor Specific Formatted Data Only (no conversion to open standard) Use of ISA Raw Data File and provide vendor file as tar.gz file and a ISA Comment[vendor] Field Reporting both Vendor Specific and Open Standard converted data: ISA NMR: Bruker files ISA Raw Data File: to provide uri to zip, md5 checksummed vendor file ISA Derived Data File to provide URI to nmrml zipped and checksummed file. MS: Waters files ISA Raw Data File: to provide uri to zip, md5 checksummed vendor file ISA Raw Data File: to provide uri to zip, md5 checksummed vendor file ISA Derived Data File to provide URI to mzml zipped and checksummed file. WORK PLAN Consistent with the work so far and building on it, WP8 s attention for deliverable D8.4 has been focused on collecting the needs from the community and practitioners, regularly meeting with them and reaching out for input in order to shape specifications attuned to use cases. The actual delivery of the standardization format and supporting documentation is meant to take place in the remaining tasks and deliverables (T8.4 and D8.4.1-D8.4.2, on month 24 and month 30 respectively. Objectives O8.1 Define metadata and data exchange standards, along with technical and user documentations. O8.2 Implement and maintain PhenoMeNal reference implementations. Tasks T8.1: Use cases and state of the art of communication standards T8.2: Standards for exchanging experimental and clinical metadata T8.3: Data standards exchange formats T8.4: Harmonization of data matrices and analytical results 16

17 T8.5: Maintain documentation and disseminate information Deliverables D8.1 Report on community standards for reporting, access and integrity supported in the PhenoMeNal grid; to be disseminated in a dedicated BioSharing page and via the project website. (M12) D8.2: Modularized ISA model and format: biospecimen centric schema, corresponding xml schemas, reference implementation guidelines and validation rules. (M24) D8.3: nmrml, mzml data exchange formats and associated terminologies for instrument raw, with reference implementation guidelines and validation rules. (M18) D8.4: Signal processing and analysis data exchange format D8.4.1: Specifications for derived data matrices, specifications and terminology for description of analysis and statistical results (M24) D8.4.2: Reference implementation guidelines and validation rules (M30) DELIVERY AND SCHEDULE The delivery is delayed: No CONCLUSION The delivery of the machine readable specifications of the ISA syntax along with the supporting ISA-API for read/write calls, syntactic validation and content checking, as well as conversion capabilities to other formats gives metabolomics users powerful tools for structuring experimental metadata to high levels of quality. The recent addition of a create mode to boostrap creation of ISA documents adds a unique capability, that of moving data management from a retrospective activity to a prospective one. Indeed, the ISA-API create function enables the creation of prepopulation metadata capture templates by placing the notion of study design at the core of the reporting. The coming months will see deeper validation and testing of the approach by deploying the API with several Phenome Centres. 17

A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015

A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015 Deliverable 8.4.1 Project ID 654241 Project Title Project Acronym Start Date of the Project Duration of the Project A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype

More information

Deliverable A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015

Deliverable A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015 Deliverable 9.5.1 Project ID 654241 Project Title A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. Project Acronym Start Date of the Project PhenoMeNal

More information

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University

More information

Deliverable 6.3. A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015

Deliverable 6.3. A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015 Deliverable 6.3 Project ID 654241 Project Title Project Acronym Start Date of the Project Duration of the Project Work Package Number Work Package Title Deliverable Title Delivery Date Work Package leader

More information

Real converters, parsers & validators for NMR-ML. Standards Development. WP leader: Steffen Neumann IPB

Real converters, parsers & validators for NMR-ML. Standards Development. WP leader: Steffen Neumann IPB Deliverable D2.5 Project Title: Developing an efficient e-infrastructure, standards and dataflow for metabolomics and its interface to biomedical and life science e-infrastructures in Europe and world-wide

More information

Scientific Research Data Management Policy

Scientific Research Data Management Policy Scientific Research Data Management Policy DOCUMENT SUMMARY Document No. SRDMP-0001 Ref. Document Title Author(s) Policy Sponsor Scientific Research Data Management Policy Karen Ambrose Alison Davis DOCUMENT

More information

Developing a Research Data Policy

Developing a Research Data Policy Developing a Research Data Policy Core Elements of the Content of a Research Data Management Policy This document may be useful for defining research data, explaining what RDM is, illustrating workflows,

More information

Package Risa. November 28, 2017

Package Risa. November 28, 2017 Version 1.20.0 Date 2013-08-15 Package R November 28, 2017 Title Converting experimental metadata from ISA-tab into Bioconductor data structures Author Alejandra Gonzalez-Beltran, Audrey Kauffmann, Steffen

More information

Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform

Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform 1. Exploring the IDR This current IDR web user interface (WUI) is based on the open source

More information

ELIXIR Human Data Use Case

ELIXIR Human Data Use Case ELIXIR Human Data Use Case Mikael Borg, ELIXIR Sweden ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.

More information

Deliverable 8.3. A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015

Deliverable 8.3. A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. 1st September 2015 Deliverable 8.3 Project ID 654241 Project Title Project Acronym Start Date of the Project Duration of the Project Work Package Number Work Package Title Deliverable Title Delivery Date Work Package leader

More information

ISAcreator V User Guide. User Guide: V1.3.2 February2011 Contact: Download:

ISAcreator V User Guide. User Guide: V1.3.2 February2011 Contact: Download: ISAcreator V 1.3.2 User Guide User Guide: V1.3.2 February2011 Contact: isatools@googlegroups.com Download: http://isa-tools.org 1 USER GUIDE LIST OF IMPROVEMENTS Improved user interface - addition of pure

More information

D WSMO Data Grounding Component

D WSMO Data Grounding Component Project Number: 215219 Project Acronym: SOA4All Project Title: Instrument: Thematic Priority: Service Oriented Architectures for All Integrated Project Information and Communication Technologies Activity

More information

OpenBudgets.eu: Fighting Corruption with Fiscal Transparency. Project Number: Start Date of Project: Duration: 30 months

OpenBudgets.eu: Fighting Corruption with Fiscal Transparency. Project Number: Start Date of Project: Duration: 30 months OpenBudgets.eu: Fighting Corruption with Fiscal Transparency Project Number: 645833 Start Date of Project: 01.05.2015 Duration: 30 months Deliverable 4.1 Specification of services' Interfaces Dissemination

More information

Data Curation Handbook Steps

Data Curation Handbook Steps Data Curation Handbook Steps By Lisa R. Johnston Preliminary Step 0: Establish Your Data Curation Service: Repository data curation services should be sustained through appropriate staffing and business

More information

EUROPEAN MEDICINES AGENCY (EMA) CONSULTATION

EUROPEAN MEDICINES AGENCY (EMA) CONSULTATION EUROPEAN MEDICINES AGENCY (EMA) CONSULTATION Guideline on GCP compliance in relation to trial master file (paper and/or electronic) for content, management, archiving, audit and inspection of clinical

More information

How to store and visualize RNA-seq data

How to store and visualize RNA-seq data How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

Analytics Toolkit - Final Deployment

Analytics Toolkit - Final Deployment The NOMAD (Novel Materials Discovery) Laboratory a European Centre of Excellence Analytics Toolkit - Final Deployment Deliverable No: 4.3 Expected Delivery Date: 31/10/2017, M24 Actual Delivery Date: 06/12/2017,

More information

The Clinical Data Repository Provides CPR's Foundation

The Clinical Data Repository Provides CPR's Foundation Tutorials, T.Handler,M.D.,W.Rishel Research Note 6 November 2003 The Clinical Data Repository Provides CPR's Foundation The core of any computer-based patient record system is a permanent data store. The

More information

Deliverable A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data.

Deliverable A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. Project ID 654241 Deliverable 9.2.3 Project Title Project Acronym A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data. PhenoMeNal Start Date of the Project

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Guoqian Jiang 1, Eric Prud Hommeax 2, and Harold R. Solbrig 1 1 Mayo Clinic, Rochester, MN, 55905, USA 2

More information

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury DATA Act Information Model Schema (DAIMS) Architecture U.S. Department of the Treasury September 22, 2017 Table of Contents 1. Introduction... 1 2. Conceptual Information Model... 2 3. Metadata... 4 4.

More information

Continuous auditing certification

Continuous auditing certification State of the Art in cloud service certification Cloud computing has emerged as the de-facto-standard when it comes to IT delivery. It comes with many benefits, such as flexibility, cost-efficiency and

More information

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project Medici for Digital Cultural Heritage Libraries George Tsouloupas, PhD The LinkSCEEM Project Overview of Digital Libraries A Digital Library: "An informal definition of a digital library is a managed collection

More information

Implementation of a reporting workflow to maintain data lineage for major water resource modelling projects

Implementation of a reporting workflow to maintain data lineage for major water resource modelling projects 18 th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009 http://mssanz.org.au/modsim09 Implementation of a reporting workflow to maintain data lineage for major water Merrin, L.E. 1 and S.M.

More information

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine Embracing Semantic Technology for Better Metadata Authoring in Biomedicine Attila L. Egyedi, Martin J. O Connor, Marcos Martínez-Romero, Debra Willrett, Josef Hardi, John Graybeal, and Mark A. Musen Stanford

More information

Metadata Ingestion and Processinng

Metadata Ingestion and Processinng biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch

More information

Document Title Ingest Guide for University Electronic Records

Document Title Ingest Guide for University Electronic Records Digital Collections and Archives, Manuscripts & Archives, Document Title Ingest Guide for University Electronic Records Document Number 3.1 Version Draft for Comment 3 rd version Date 09/30/05 NHPRC Grant

More information

AUTOTASK ENDPOINT BACKUP (AEB) SECURITY ARCHITECTURE GUIDE

AUTOTASK ENDPOINT BACKUP (AEB) SECURITY ARCHITECTURE GUIDE AUTOTASK ENDPOINT BACKUP (AEB) SECURITY ARCHITECTURE GUIDE Table of Contents Dedicated Geo-Redundant Data Center Infrastructure 02 SSAE 16 / SAS 70 and SOC2 Audits 03 Logical Access Security 03 Dedicated

More information

ehealth EIF ehealth European Interoperability Framework European Commission ISA Work Programme

ehealth EIF ehealth European Interoperability Framework European Commission ISA Work Programme ehealth EIF ehealth European Interoperability Framework European Commission ISA Work Programme Overall Executive Summary A study prepared for the European Commission DG Connect This study was carried out

More information

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library PharmaSUG 2018 - Paper SS-12 Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library Veena Nataraj, Erica Davis, Shire ABSTRACT Establishing internal Data Standards helps companies

More information

DATA SELECTION AND APPRAISAL CHECKLIST University of Reading Research Data Archive

DATA SELECTION AND APPRAISAL CHECKLIST University of Reading Research Data Archive Research and Enterprise Services DATA SELECTION AND APPRAISAL CHECKLIST University of Reading Research Data Archive Introduction This Selection and Appraisal Checklist provides a set of criteria against

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria

enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria www.ideaconsult.net Ø enanomapper database: data model, technology; NANoREG data transfer

More information

Session Two: OAIS Model & Digital Curation Lifecycle Model

Session Two: OAIS Model & Digital Curation Lifecycle Model From the SelectedWorks of Group 4 SundbergVernonDhaliwal Winter January 19, 2016 Session Two: OAIS Model & Digital Curation Lifecycle Model Dr. Eun G Park Available at: https://works.bepress.com/group4-sundbergvernondhaliwal/10/

More information

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata Meeting Host Supporting Partner Meeting Sponsors Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata 105th OGC Technical Committee Palmerston North, New Zealand Dr.

More information

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

The ELIXIR of Linked Data

The ELIXIR of Linked Data The ELIXIR of Linked Data Professor Carole Goble (UK node) Barend Mons (NL node), Helen Parkinson (EMBL-EBI node) The Interoperability Services Backbone Team European Life Sciences Infrastructure for Biological

More information

Introduction to Web Services & SOA

Introduction to Web Services & SOA References: Web Services, A Technical Introduction, Deitel & Deitel Building Scalable and High Performance Java Web Applications, Barish Web Service Definition The term "Web Services" can be confusing.

More information

Description of CORE Implementation in Java

Description of CORE Implementation in Java Partner s name: Istat WP number and name: WP6 Implementation library for generic interface and production chain for Java Deliverable number and name: 6.1 Description of Implementation in Java Description

More information

Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation*

Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation* Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation* Margaret Hedstrom, University of Michigan, Ann Arbor, MI USA Abstract: This paper explores a new way of thinking

More information

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations Ref. Ares(2017)3291958-30/06/2017 Readiness of ICOS for Necessities of integrated Global Observations Deliverable 6.4 Initial Data Management Plan RINGO (GA no 730944) PUBLIC; R RINGO D6.5, Initial Risk

More information

TEXT MINING: THE NEXT DATA FRONTIER

TEXT MINING: THE NEXT DATA FRONTIER TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom 2 OpenMinTeD Establish an open and sustainable

More information

Deliverable Initial Data Management Plan

Deliverable Initial Data Management Plan EU H2020 Research and Innovation Project HOBBIT Holistic Benchmarking of Big Linked Data Project Number: 688227 Start Date of Project: 01/12/2015 Duration: 36 months Deliverable 8.5.1 Initial Data Management

More information

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the

More information

European Platform on Rare Diseases Registration

European Platform on Rare Diseases Registration The European Commission s science and knowledge service Joint Research Centre European Platform on Rare Diseases Registration Simona Martin Agnieszka Kinsner-Ovaskainen Monica Lanzoni Andri Papadopoulou

More information

Transitioning to Symyx

Transitioning to Symyx Whitepaper Transitioning to Symyx Notebook by Accelrys from Third-Party Electronic Lab Notebooks Ordinarily in a market with strong growth, vendors do not focus on competitive displacement of competitor

More information

IBM Advantage: IBM Watson Compare and Comply Element Classification

IBM Advantage: IBM Watson Compare and Comply Element Classification IBM Advantage: IBM Watson Compare and Comply Element Classification Executive overview... 1 Introducing Watson Compare and Comply... 2 Definitions... 3 Element Classification insights... 4 Sample use cases...

More information

Integrating SAS with Open Source. Software

Integrating SAS with Open Source. Software Integrating SAS with Open Source Software Jeremy Fletcher Informatics Specialist Pharma Global Informatics F. Hoffmann-La Roche F. Hoffmann La Roche A Global Healthcare Leader One of the leading research-intensive

More information

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets Jeffrey Brown, Lesley Curtis, and Rich Platt June 13, 2014 Previously The NIH Collaboratory:

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

How WhereScape Data Automation Ensures You Are GDPR Compliant

How WhereScape Data Automation Ensures You Are GDPR Compliant How WhereScape Data Automation Ensures You Are GDPR Compliant This white paper summarizes how WhereScape automation software can help your organization deliver key requirements of the General Data Protection

More information

Adding Research Datasets to the UWA Research Repository

Adding Research Datasets to the UWA Research Repository University Library Adding Research Datasets to the UWA Research Repository Guide to Researchers What does UWA mean by Research Datasets? Research Data is defined as facts, observations or experiences on

More information

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC Chapter 8: SDLC Reviews and Audit... 2 8.1 Learning objectives... 2 8.1 Introduction... 2 8.2 Role of IS Auditor in SDLC... 2 8.2.1 IS Auditor as Team member... 2 8.2.2 Mid-project reviews... 3 8.2.3 Post

More information

Science Europe Consultation on Research Data Management

Science Europe Consultation on Research Data Management Science Europe Consultation on Research Data Management Consultation available until 30 April 2018 at http://scieur.org/rdm-consultation Introduction Science Europe and the Netherlands Organisation for

More information

Glossary of Exchange Network Related Groups

Glossary of Exchange Network Related Groups Glossary of Exchange Network Related Groups CDX Central Data Exchange EPA's Central Data Exchange (CDX) is the point of entry on the National Environmental Information Exchange Network (Exchange Network)

More information

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation Paper DH05 How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation Judith Goud, Akana, Bennekom, The Netherlands Priya Shetty, Intelent, Princeton, USA ABSTRACT The traditional

More information

Introduction to Web Services & SOA

Introduction to Web Services & SOA References: Web Services, A Technical Introduction, Deitel & Deitel Building Scalable and High Performance Java Web Applications, Barish Service-Oriented Programming (SOP) SOP A programming paradigm that

More information

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012) National Data Sharing and Accessibility Policy-2012 (NDSAP-2012) Department of Science & Technology Ministry of science & Technology Government of India Government of India Ministry of Science & Technology

More information

DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system

DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system GENERAL PHILOSOPHY (Arthur Grossman, Steve Palumbi, and John Pringle) The three Principal

More information

This document is a preview generated by EVS

This document is a preview generated by EVS INTERNATIONAL STANDARD IEC 62559-3 Edition 1.0 2017-12 colour inside Use case methodology Part 3: Definition of use case template artefacts into an XML serialized format IEC 62559-3:2017-12(en) THIS PUBLICATION

More information

Legal Issues in Data Management: A Practical Approach

Legal Issues in Data Management: A Practical Approach Legal Issues in Data Management: A Practical Approach Professor Anne Fitzgerald Faculty of Law OAK Law Project Legal Framework for e-research Project Queensland University of Technology (QUT) am.fitzgerald@qut.edu.au

More information

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Guoqian Jiang 1, Eric Prud Hommeaux 2, Guohui Xiao 3, and Harold R. Solbrig 1 1 Mayo Clinic, Rochester,

More information

Comply with Data Integrity Regulations with Chromeleon CDS Software

Comply with Data Integrity Regulations with Chromeleon CDS Software Comply with Data Integrity Regulations with Chromeleon CDS Software Anna Severoni Sales Support Specialist for Chromatography Thermo Fisher Scientific, Rodano (MI) The world leader in serving science Introduction

More information

Agenda. Bibliography

Agenda. Bibliography Humor 2 1 Agenda 3 Trusted Digital Repositories (TDR) definition Open Archival Information System (OAIS) its relevance to TDRs Requirements for a TDR Trustworthy Repositories Audit & Certification: Criteria

More information

Content Management for the Defense Intelligence Enterprise

Content Management for the Defense Intelligence Enterprise Gilbane Beacon Guidance on Content Strategies, Practices and Technologies Content Management for the Defense Intelligence Enterprise How XML and the Digital Production Process Transform Information Sharing

More information

GEOSS Data Management Principles: Importance and Implementation

GEOSS Data Management Principles: Importance and Implementation GEOSS Data Management Principles: Importance and Implementation Alex de Sherbinin / Associate Director / CIESIN, Columbia University Gregory Giuliani / Lecturer / University of Geneva Joan Maso / Researcher

More information

Beginning To Define ebxml Initial Draft

Beginning To Define ebxml Initial Draft Beginning To Define ebxml Initial Draft File Name Version BeginningToDefineebXML 1 Abstract This document provides a visual representation of how the ebxml Architecture could work. As ebxml evolves, this

More information

Supporting Patient Screening to Identify Suitable Clinical Trials

Supporting Patient Screening to Identify Suitable Clinical Trials Supporting Patient Screening to Identify Suitable Clinical Trials Anca BUCUR a,1, Jasper VAN LEEUWEN a, Njin-Zu CHEN a, Brecht CLAERHOUT b Kristof DE SCHEPPER b, David PEREZ-REY c, Raul ALONSO-CALVO c,

More information

Maximizing Public Data Sources for Sequencing and GWAS

Maximizing Public Data Sources for Sequencing and GWAS Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda

More information

Deliverable Final Data Management Plan

Deliverable Final Data Management Plan EU H2020 Research and Innovation Project HOBBIT Holistic Benchmarking of Big Linked Data Project Number: 688227 Start Date of Project: 01/12/2015 Duration: 36 months Deliverable 8.5.3 Final Data Management

More information

Semantic Web for Earth and Environmental Terminology (SWEET) Status, Future Development and Community Building

Semantic Web for Earth and Environmental Terminology (SWEET) Status, Future Development and Community Building Semantic Web for Earth and Environmental Terminology (SWEET) 2018 Status, Future Development and Community Building 2 Agenda and Purpose Current status of SWEET e.g. What has the community been doing?

More information

T103 PlantPAx System Fundamentals

T103 PlantPAx System Fundamentals T103 PlantPAx System Fundamentals PUBLIC INFORMATION Rev 5058-CO900E Copyright 2014 Rockwell Automation, Inc. All Rights Reserved. PUBLIC INFORMATION Copyright 2014 Rockwell Automation, Inc. All Rights

More information

Basic Principles of MedWIS - WISE interoperability

Basic Principles of MedWIS - WISE interoperability Co-ordination committee seminar of the national focal points Basic Principles of MedWIS - WISE interoperability Eduardo García ADASA Sistemas Nice - France Agenda WISE vs MedWIS WISE WISE DS WISE vs WISE

More information

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting legislation The Mission of the Joint Research

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization Vicki Seyfert-Margolis, PhD Senior Advisor, Science Innovation and Policy Food and Drug Administration IOM Sharing Clinical Research

More information

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution Developing a Particulate Sampling and In Situ Preservation System for High Spatial and Temporal Resolution Studies of Microbial and Biogeochemical Processes 28 September 2010 PI: John Chip Breier, Ph.D.

More information

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

NSF Data Management Plan Template Duke University Libraries Data and GIS Services NSF Data Management Plan Template Duke University Libraries Data and GIS Services NSF Data Management Plan Requirement Overview The Data Management Plan (DMP) should be a supplementary document of no more

More information

On the Design and Implementation of a Generalized Process for Business Statistics

On the Design and Implementation of a Generalized Process for Business Statistics On the Design and Implementation of a Generalized Process for Business Statistics M. Bruno, D. Infante, G. Ruocco, M. Scannapieco 1. INTRODUCTION Since the second half of 2014, Istat has been involved

More information

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020

More information

Building a Data Strategy for a Digital World

Building a Data Strategy for a Digital World Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service

More information

How to write ADaM specifications like a ninja.

How to write ADaM specifications like a ninja. Poster PP06 How to write ADaM specifications like a ninja. Caroline Francis, Independent SAS & Standards Consultant, Torrevieja, Spain ABSTRACT To produce analysis datasets from CDISC Study Data Tabulation

More information

warwick.ac.uk/lib-publications

warwick.ac.uk/lib-publications Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations

More information

NOMAD Metadata for all

NOMAD Metadata for all EMMC Workshop on Interoperability NOMAD Metadata for all Cambridge, 8 Nov 2017 Fawzi Mohamed FHI Berlin NOMAD Center of excellence goals 200,000 materials known to exist basic properties for very few highly

More information

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

VI-SEEM Data Repository.   Presented by: Panayiotis Charalambous SIMDAS AND VI-SEEM WORKSHOP ON DATA MANAGEMENT AND SEMANTIC STRUCTURES FOR CROSS-DISCIPLINARY RESEARCH IN THE SEEM REGION VRE for regional Interdisciplinary communities in Southeast Europe and the Eastern

More information

Common Protocol Template (CPT) Frequently Asked Questions

Common Protocol Template (CPT) Frequently Asked Questions Last Updated 12-December-2017 Topics 1 Rationale for Using the CPT... 2 2 Stakeholder Input to CPT Development... 3 3 Alignment of CPT and National Institutes of Health (NIH) Food and Drug Administration

More information

SAS IT Resource Management 3.8: Reporting Guide

SAS IT Resource Management 3.8: Reporting Guide SAS IT Resource Management 3.8: Reporting Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS IT Resource Management 3.8: Reporting Guide.

More information

Dataset-XML - A New CDISC Standard

Dataset-XML - A New CDISC Standard Dataset-XML - A New CDISC Standard Lex Jansen Principal Software Developer @ SAS CDISC XML Technologies Team Single Day Event CDISC Tools and Optimization September 29, 2014, Cary, NC Agenda Dataset-XML

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

WebEx Management. GP Connect. WebEx Interactions

WebEx Management. GP Connect. WebEx Interactions WebEx Management GP Connect WebEx Interactions Submit questions using the chat facility to everyone Please keep chat conversations private Refrain from answering questions proposed We ll answer questions

More information

Architecture Tool Certification Certification Policy

Architecture Tool Certification Certification Policy Architecture Tool Certification Certification Policy Version 1.0 January 2012 Copyright 2012, The Open Group All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,

More information

ehealth action in the EU

ehealth action in the EU ehealth action in the EU ehealth for smart and inclusive growth 13 February 2014 Jerome Boehm DG SANCO ehealth and Health Technology Assessment General Health Objectives of the EU cooperation on ehealth

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

e-infrastructures in FP7 INFO DAY - Paris

e-infrastructures in FP7 INFO DAY - Paris e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered

More information

EXAM PREPARATION GUIDE

EXAM PREPARATION GUIDE EXAM PREPARATION GUIDE PECB Certified ISO/IEC 38500 Lead IT Corporate Governance Manager The objective of the PECB Certified ISO/IEC 38500 Lead IT Corporate Governance Manager examination is to ensure

More information

DATA PRESERVATION AND SHARING INITIATIVE. 1. Aims of the EORTC QLG Data Repository project

DATA PRESERVATION AND SHARING INITIATIVE. 1. Aims of the EORTC QLG Data Repository project DATA PRESERVATION AND SHARING INITIATIVE 1. Aims of the EORTC QLG Data Repository project The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group Data Repository project

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information