Implementing the RDA data citation recommendations in the distributed Infrastructure of the Virtual Atomic and Molecular Data Centre

Similar documents
Approaches to Making Dynamic Data Citeable Recommendations of the RDA Working Group Andreas Rauber

Data Citation Working Group P7 March 2 nd 2016, Tokyo

Online database services for atomic and molecular data

VAMDC: The Virtual Atomic and Molecular Data Centre

Enabling Precise Identification and Citabilityof Dynamic Data. Recommendations of the RDA Working Group on Data Citation

VAMDC: The Virtual Atomic and Molecular Data Centre: Recent Advances and Future Prospects

Implementing the RDA Data Citation Recommendations for Long Tail Research Data. Stefan Pröll

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

A Guide to CMS Functions

arxiv: v1 [cs.dl] 9 May 2016

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

Europlanet IDIS: Adapting existing VO building blocks to Planetary Sciences

The Virtual Observatory and the IVOA

Efficiently Sharing All of a Patient s Data With All their Providers Completing the Model

DOIs for Research Data

Cancer Waiting Times. Getting Started with Beta Testing. Beta Testing period: 01 February May Copyright 2018 NHS Digital

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Introduction to INSPIRE. Network Services

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Information Integration

Oracle Financial Services Governance, Risk, and Compliance Workflow Manager User Guide. Release February 2016 E

Bioinformatics Data Distribution and Integration via Web Services and XML

Callicott, Burton B, Scherer, David, Wesolek, Andrew. Published by Purdue University Press. For additional information about this book

International Jmynal of Intellectual Advancements and Research in Engineering Computations

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Metadata Models for Experimental Science Data Management

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Bengkel Kelestarian Jurnal Pusat Sitasi Malaysia. Digital Object Identifier Way Forward. 12 Januari 2017

Web Services for Visualization

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Alma Mater Studiorum University of Bologna CdS Laurea Magistrale (MSc) in Computer Science Engineering

Part III. Issues in Search Computing

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Gender Info 2007 Getting Started

Introduction to Federation Server

Distributed Data Management with Storage Resource Broker in the UK

The VERCE Architecture for Data-Intensive Seismology

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment

Grant permissions sql server Grant permissions sql server 2008.zip

Instructions for Exporting Cart Data and Uploading Inpatient Clinical Data to Qualitynet Updated November 6, 2013

Reusability and Adaptability of Interactive Resources in Web-Based Educational Systems. 01/06/2003

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Fine-Grained Access Control

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

warwick.ac.uk/lib-publications

SHARING YOUR RESEARCH DATA VIA

MetaMatrix Enterprise Data Services Platform

v1.4, 7/1/2017 EPN-TAP services: Virtis / Venus Express demo S. Erard, B. Cecconi, P. Le Sidaner, F. Henry, R. Savalle, C. Chauvin

Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud

INSTITUTIONAL REPOSITORY SERVICES

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

Indiana University Research Technology and the Research Data Alliance

Europeana DSI 2 Access to Digital Resources of European Heritage

Research on the Interoperability Architecture of the Digital Library Grid

<Insert Picture Here> Enterprise Data Management using Grid Technology

Building a federated science web one link at a time

Smart Federated Search for Egyptian Knowledge Bank

panmetaworks User Manual Version 1 July 2009 Robert Huber MARUM, Bremen, Germany

Memorandum of Understanding between the Central LHIN and the Toronto Central LHIN to establish a Joint ehealth Program

MyUni - Discussion Boards, Blogs, Wikis & Journals

Extended Search Administration

SAS Clinical Data Integration 2.6

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Introduction of SeaDataNet and EMODNET Working towards a harmonised data infrastructure for marine data. Peter Thijsse MARIS

Chapter 3. Database Architecture and the Web

Reproducibility and FAIR Data in the Earth and Space Sciences

Astrophysics and the Grid: Experience with EGEE

SciVerse ScienceDirect. User Guide. October SciVerse ScienceDirect. Open to accelerate science

Prices in Japan (Yen) Oracle Technology Global Price List December 8, 2017

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Theseos Data Traceability Query Engine. Intelligent Information Systems IBM Almaden Research Center

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

The European Commission s science and knowledge service

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Access Governance in a Cloudy Environment. Nabeel Nizar VP Worldwide Solutions

[GSoC Proposal] Securing Airavata API

epic and the Handle System

Vincent BOUDON, Christian WENGER, Romain SURLEAU (SMPCA Group)

Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers

Common Language Resources and Technology Infrastructure REVISED WEBSITE

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo

Opening Your Content to Metasearch Services: The Bepress and Ex Libris Experience. Karen Groves MetaLib Product Manager

(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014

Multiprocessor scheduling

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

ArcGIS Enterprise Security: An Introduction. Gregory Ponto & Jeff Smith

Introduction to Distributed HTC and overlay systems

Full file at

VAMDC-TAP service validation tool

Lessons learnt from building the International Virtual Observatory Alliance. Françoise Genova

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017

Data Citation Technical Issues - Identification

The Logical Data Store

Fundamentals: Managing and Extending Microsoft Office & SharePoint with EMC Documentum

Basic Principles of MedWIS - WISE interoperability

The KASCADE Cosmic ray Data Center - providing open access to astroparticle physics research data

Transcription:

Implementing the RDA data citation recommendations in the distributed Infrastructure of the Virtual Atomic and Molecular Data Centre C.M. Zwölf, N. Moreau and VAMDC consortium

The Virtual Atomic and Molecular Data Centre Health and clinical sciences Astrophysics Plasma sciences VAMDC Single and unique access to heterogeneous A+M Databases Lighting technologies Atmospheric Physics Federates ~30 heterogeneous databases http://portal.vamdc.org/ The V of VAMDC stands for Virtual in the sense that the e infrastructure does not contain data. The infrastructure is a wrapping for exposing in a unified way a set of heterogeneous databases. The consortium is politically organized around a Memorandum of understanding (15 international members have signed the MoU, 1 November 2014) High quality scientific data come from different Physical/Chemical Communities Fusion technologies Environmental sciences Provides data producers with a large dissemination platform Remove bottleneck between dataproducers and wide body of users

The VAMDC infrastructure technical architecture Existing Independent A+M database

The VAMDC infrastructure technical architecture Existing Independent A+M database VAMDC wrapping layer

The VAMDC infrastructure technical architecture Accept queries submitted in standard grammar (subset of SQL) Existing Independent A+M database VAMDC wrapping layer Provides output formatted into standard XML file (XSAMS)

The VAMDC infrastructure technical architecture Accept queries submitted in standard grammar (subset of SQL) Existing Independent A+M database VAMDC wrapping layer Provides output formatted into standard XML file (XSAMS) For further details, cf. http://standards.vamdc.eu

The VAMDC infrastructure technical architecture Existing Independent A+M database VAMDC wrapping layer

The VAMDC infrastructure technical architecture 1 Existing Independent A+M database VAMDC wrapping layer

The VAMDC infrastructure technical architecture 1

The VAMDC infrastructure technical architecture 1 N 1 N

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 Available nodes (with their attributes) are registered into the main registry. N

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) N

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 N VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) 1 User submits a unique query

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu 2 The client ask the registry for the available nodes N 1 N VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) 1 User submits a unique query

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 N 3 The query is dispatched to the available Nodes 2 The client ask the registry for the available nodes VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) 1 User submits a unique query

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) N 4 Nodes standardized outputs are collected

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 N 4 Nodes standardized outputs are collected VAMDC Clients (Portal, Cassis, SpectCol, SpecView, ) 5 Results are served to the User.

The VAMDC infrastructure technical architecture 1 Registries http://registry.vamdc.eu N 1 N Users may also submit queries directly to the nodes they want to hit VAMDC Clients (Portal, Cassis, SpectCol, SpecView, )

The Research Data Alliance and the Data Citation WG

The Research Data Alliance and the Data Citation WG The RDA recommendations comes from standalone databases or warehouse. VAMDC is a distributed infrastructure, with no central management system.

Highlighting the main issues How to build a Query Store in our distributed infrastructure? The solution belongs to a space with lot of constraints Any choice will impact each of the ~30 databases federated by VAMDC. Any technological change of the infrastructure must be validated by the majority of the members The solution must cause least effects on the existing infrastructure and have minimal implementing cost for the database owners. This constraint suggest to fit the solution into the standard wrapping layer transforming an autonomous Database into a VAMDC node.

Highlighting the main issues But the problem not only technical, it is anthropological too Tagging and versioning data What does it really mean data citation?

Highlighting the main issues But the problem not only technical, it is anthropological too We see technically how to do that Tagging and versioning data Ok, but What is the data granularity for tagging? Naturally it is the dataset (A+M data have no meaning outside this given context) But each data provider defines differently what a dataset is. What does it really mean data citation?

Highlighting the main issues But the problem not only technical, it is anthropological too We see technically how to do that Tagging and versioning data Ok, but What is the data granularity for tagging? Naturally it is the dataset (A+M data have no meaning outside this given context) But each data provider differently define what a dataset is. Everyone knows what it is! What does it really mean data citation? Yes, but everyone has its own definition RDA cite databases record or output files. (an extracted data file may have an H factor) VAMDC cite all the papers used for compiling the content of a given output file.

Sketching the solution strategy Implementation is an overlay to the standard / output layer, thus independent from any specific data node Tagging versions of data Query Store Two layers mechanisms 1 Fine grained granularity: Evolution of XSAMS output standard for tracking data modifications* 2 Coarse grained granularity: At each data modification to a given data node, the version of the Data Node changes Is built over the versioning of Data (the coarse grained mechanism) Is plugged over the existing VAMDC data extraction mechanisms. Due to the distributed VAMDC architecture, the Query Store may be seen as a smart log service. With the second mechanism we know that something changed : in other words, we know that the result of an identical query may be different from one version to the other. The detail of which data changed is accessible using the first mechanisms. * http://dx.doi.org/10.1016/j.jms.2016.04.009 arxiv version at https://arxiv.org/abs/1606.00405

Let us focus on the query store: The difficulties we have to cope with: Handle a query store in a distributed environment (RDA did not design it for these configurations). Integrate the query store with the existing VAMDC infrastructure.

Let us focus on the query store: The difficulties we have to cope with: Handle a query store in a distributed environment (RDA did not design it for these configurations). Integrate the query store with the existing VAMDC infrastructure. The implementation of the query store is the goal of a joint collaboration between VAMDC and RDA Europe 3. Development started during spring 2016. Final product released during 2017.

Let us focus on the query store: The difficulties we have to cope with: Handle a query store in a distributed environment (RDA did not design it for these configurations). Integrate the query store with the existing VAMDC infrastructure. The implementation of the query store is the goal of a joint collaboration between VAMDC and RDA Europe 3. Development started during spring 2016. Final product released during 2017. Collaboration with Elsevier for embedding the VAMDC query store into the pages displaying the digital version of papers. Designing technical solution for Paper / data linking at the paper submission (for authors) Paper / data linking at the paper display (for readers)

A functional overview of the Query Store Sketching the functioning From the final user point of view: Submit Query to Data extraction procedure Any VAMDC client and/or Direct access to node Query VAMDC infrastructure The original query Computed response Result Display Access to the output data file Digital Unique Identifier associated to the current extraction Resolves Date & time where query was processed Version of the infrastructure when the query was processed List of publications needed for answering the query Query Metadata Landing Page Query Store When supported (by the VAMDC federated DB): retrieve the output data file as it was computed (query re execution)

Strengths of the Query Store: The QS usage is transparent for users (complexity is hidden). Live monitoring of all the queries and users of the VAMDC e infrastructure Data providers may measure their impact and have detailed statistics of usage. It will be easy for authors to cite data coming from VAMDC. Credit to producers will be automatic.

Strengths of the Query Store: The QS usage is transparent for users (complexity is hidden). Live monitoring of all the queries and users of the VAMDC e infrastructure Data providers may measure their impact and have detailed statistics of usage. It will be easy for authors to cite data coming from VAMDC. Credit to producers will be automatic. Minimal impact for federated database owners for dealing with the Query Store Database owners just need to install the latest version of the VAMDC wrapping software Data providers has to fill a version field (~ a simple string), which is just the version label. When the database is modified and/or data node software changes, the version label should evolve.

Remarks before the live demo: We provided the VAMDC infrastructure with a working Query Store The concept adopted and the implemented code are quite generic and both can be adapted to other use cases: If it worked in our complex distributed case, it may work in many contexts The cost for adapting an existing service/database to a VAMDC type Query Store is minimal All the complexity is handled and masked in our generic software. Live demo link: https://youtu.be/kddwfpi22cu