Deliverable Initial Data Management Plan

Similar documents
Deliverable Final Data Management Plan

OpenBudgets.eu: Fighting Corruption with Fiscal Transparency. Project Number: Start Date of Project: Duration: 30 months

Deliverable First Version of Analytics Benchmark

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

DELIVERABLE. D3.1 - TransformingTransport Website. TT Project Title. Project Acronym

Deliverable D10.1 Launch and management of dedicated website and social media

PRELIDA. D2.3 Deployment of the online infrastructure

DELIVERABLE. D3.9 TransformingTransport Open Data Portal

D2.2 Web Platform development

D6.1. Project Portal and Communication Channels

D5.2 FOODstars website WP5 Dissemination and networking

Energy Data Innovation Network GA N

Deliverable 7.1 M3 m-resist website

Metal Recovery from Low Grade Ores and Wastes Plus

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1

Data Management Plan

Deliverable No. 4.1 SciChallenge Web Platform Early Prototype (Additional Report)

Early Detection and Integrated Management of Tuberculosis in Europe. PJ Early diagnosis of tuberculosis. D2.2 Website.

Coordinating Optimisation of Complex Industrial Processes

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

European Interoperability Reference Architecture (EIRA) overview

Horizon 2020 INFRADEV Design studies. RICHFIELDS Working Packages 2 Deliverable D2.2. Project identity. Date delivered: M6

Service Manager User Guide

Data Management Checklist

Data is the new Oil (Ann Winblad)

Introduction

Horizon2020/EURO Coordination and Support Actions. SOcietal Needs analysis and Emerging Technologies in the public Sector

SLHC-PP DELIVERABLE REPORT EU DELIVERABLE: Document identifier: SLHC-PP-D v1.1. End of Month 03 (June 2008) 30/06/2008

Design & Manage Persistent URIs

D8.1 Project website

Security Assurance Framework for Networked Vehicular Technology

Harvesting Open Government Data with DCAT-AP

Share.TEC System Architecture

Making Open Data work for Europe

Using Joinup as a catalogue for interoperability solutions. March 2014 PwC EU Services

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality

D.9.1 Web Portal Creation and Launch

Deliverable No. 6.3 E-newsletter 2

IETF TRUST. Legal Provisions Relating to IETF Documents. Approved November 6, Effective Date: November 10, 2008

The European Research Council

Deliverable First Version of the HOBBIT Platform Platform

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution

D33.1. Project website and internal and external IT communication infrastructure PRACTICE. 36 months FP7/

White Paper on UAProf Best Practices Guide

CREATE Compact REtrofit Advanced Thermal Energy storage. European Commission Archive 1x

DELIVERABLE 5.2 Establishment of a common graphical identity WP5 Dissemination

The European Commission s science and knowledge service. Joint Research Centre

IETF TRUST. Legal Provisions Relating to IETF Documents. February 12, Effective Date: February 15, 2009

European Platform on Rare Diseases Registration

Reducing Consumer Uncertainty

Science Europe Consultation on Research Data Management

DCAT-AP FOR DATA PORTALS IN EUROPE

ehealth Network ehealth Network Governance model for the ehealth Digital Service Infrastructure during the CEF funding

EVACUATE PROJECT WEBSITE

Ref. Ares(2015) /12/2015. D9.1 Project Collaborative Workspace Bénédicte Ferreira, IT

Horizon 2020 Open Research Data Pilot: What is required? Sarah Jones Digital Curation Centre

DCAT-AP CHANGE MANAGEMENT & RELEASE POLICY

From Open Data to Data- Intensive Science through CERIF

Semantically enhancing SensorML with controlled vocabularies in the marine domain

ETSI GS MEC-IEG 005 V1.1.1 ( )

A set of Common Software Quality Assurance Baseline Criteria for Research Projects

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

FP7-INFRASTRUCTURES Grant Agreement no Scoping Study for a pan-european Geological Data Infrastructure D 4.4

D4.1 DEPLOYMENT OF THE TOOL DEMONSTRATOR FINISHED

Data management Backgrounds and steps to implementation; A pragmatic approach.

Innovation Action (IA) ICT H2020-ICT Enabling the European Business Graph for Innovative Data Products and Services

D12.4. Project Logo, LIGHTest Website and Infrastructure for LIGHTest WIKI. Document Identification. Draft. Don Thibeau (OIX), Charles Sederholm (GS)

Web Portal : Complete ontology and portal

Introduction to metadata cleansing using SPARQL update queries. April 2014 PwC EU Services

Website Implementation D8.1

Placement Administration and Support System (PASS) User Guide. System Version January 2018 (v9)

ASPECT Adopting Standards and Specifications for. Final Project Presentation By David Massart, EUN April 2011

Support-EAM. Publishable Executive Summary SIXTH FRAMEWORK PROGRAMME. Project/Contract no. : IST SSA. the 6th Framework Programme

Deliverable D7.1 Graphical Identity

Deliverable 8.1: Project website and promotional material

Fault Tolerance Engine Installation Guide

WP6 D6.2 Project website

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

D3.5 Home Energy Check Tools, databases and sub-contract outputs. July 2017

A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0

Project web pages. Deliverable 6.1/WP6.

SDMX GLOBAL CONFERENCE

ETSI GS ZSM 006 V1.1.1 ( )

Multimedia Project Presentation

INSPIRE & Environment Data in the EU

Module 3. Overview of TOGAF 9.1 Architecture Development Method (ADM)

D 9.1 Project website

Final Project Report

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

Introduction to metadata management

META-SHARE : the open exchange platform Overview-Current State-Towards v3.0

Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) (Geneva, 3-5 April 2006)

D43.2 Service Delivery Infrastructure specifications and architecture M21

D7.4 Website Report (III)

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

European Commission Initiatives in telemedicine Presentation endorsed by the European Commission

Data Quality Assessment Tool for health and social care. October 2018

Public consultation on Counterfeit and Piracy Watch-List

OpenGovIntelligence. Deliverable 3.5. OpenGovIntelligence ICT tools

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

Transcription:

EU H2020 Research and Innovation Project HOBBIT Holistic Benchmarking of Big Linked Data Project Number: 688227 Start Date of Project: 01/12/2015 Duration: 36 months Deliverable 8.5.1 Initial Data Management Plan Dissemination Level Public Due Date of Deliverable Month 6, 31/05/2016 Actual Submission Date Month 6, 31/05/2016 Work Package WP 8 Task T 8.1 Type Report Approval Status Approved Version 1.0 Number of Pages 9 Filename D8.5.1_Initial_Data_Management_Plan.pdf Abstract: This report describes the initial data management plan for the project. The information in this document reflects only the author s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided as is without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/ her sole risk and liability. Project funded by the European Commission within the H2020 Framework Programme

History Version Date Reason Revised by 0.1 09/02/2016 First Draft Tom De Nies 0.2 09/05/2016 Second Draft Tom De Nies 0.3 13/05/2016 Review Marco Huber 0.4 29/05/2016 Version after 1 st Review Tom De Nies 0.5 29/05/2016 2 nd Review Axel Ngonga 1.0 29/05/2016 Final Version after Review Tom De Nies Author List Organisation Name Contact Information iminds Tom De Nies tom.denies@ugent.be USU Software Marco Huber m.huber@usu-software.de InfAI Axel Ngonga ngonga@informatik.uni-leipzig.de Page 2

Executive Summary This report describes the initial data management plan for the project. This data management plan will be used as a guideline when handling the data submitted by members of the HOBBIT community to the benchmarks. In this initial plan, we discuss the envisioned data management lifecycle (how can data be added to the platform, how can it be accessed, and how long will it be kept?), as well as the details of the data management plan as they have been agreed upon by the consortium at this time. Page 3

Table of Contents 1. DATA MANAGEMENT LIFECYCLE... 6 2. DATA MANAGEMENT PLAN... 7 2.1 DATASET REFERENCE AND NAME... 7 2.2 DATASET DESCRIPTION... 7 2.3 STANDARDS AND METADATA... 7 2.4 DATA SHARING... 8 2.5 ARCHIVING AND PRESERVATION (INCLUDING STORAGE AND BACKUP)... 9 Page 4

List of Figures Figure 1. Data Management Lifecycle Overview... 6 Figure 2. Screenshot of the current CKAN deployment... 7 Figure 3. DCAT ontology overview (source: https://www.w3.org/tr/vocab-dcat/)... 8 Page 5

1. Data Management Lifecycle HOBBIT will continuously collect various datasets (i.e., not limited to specific domains) as the base for benchmarks. Those data will initially be provided by the project industrial partners, and later on by members of the HOBBIT community. To make the data discoverable and accessible, besides providing the generated benchmarks as dump files 1 that can be loaded from the project repository, HOBBIT will also provide a SPARQL endpoint that will serve all the benchmark datasets. The HOBBIT SPARQL endpoint will enable the platform users to run their own queries against one or more benchmark(s) to obtain tailored benchmark(s) that fit exactly each user needs. Dataset Reference + metadata Data Dump ckan SPARQL Endpoint Figure 1. Data Management Lifecycle Overview To keep the dataset submission process manageable, we host an instance of the CKAN open source data portal software, extended with custom metadata fields for the HOBBIT project. For the time being, this instance is hosted at http://hobbit.iminds.be. When the benchmarking platform itself goes online, the CKAN instance will be moved there, to accommodate more space for datasets. Users who want to add a dataset of their own, first need to request to be added to an organization on the CKAN instance, after which they can add datasets to this organization. http://projecthobbit.eu/contacts/ Datasets will be kept available on the HOBBIT platform for at least the lifetime of the project, unless they are removed by their owners. After the project, the HOBBIT platform will be maintained by the HOBBIT Association, and so will the datasets. Owners may add or remove a dataset at any time. 1 The file format has not been decided upon yet, and will depend on the results of the first requirements investigation and survey of the stakeholders. Page 6

Figure 2. Screenshot of the current CKAN deployment. 2. Data Management Plan In conformity with the guidelines of the Commission, we will provide the following information for every dataset submitted to the project. This information will be obtained either through automatically generating it (e.g., for the identifier), or by asking whoever provides the dataset upon submission. 2.1 Dataset Reference and Name The datasets submitted will be identified and references by using a URL. This URL can then be used to access the dataset (either through dump file or SPARQL endpoint), and also be used as an identifier to provide metadata. 2.2 Dataset Description The submitter will be asked to provide a short textual, human-interpretable description of the dataset, at least in English, and optionally in other languages as well. Additionally, a machineinterpretable description will also be provided (see 2.3 Standards and metadata). 2.3 Standards and Metadata Since we are dealing with Linked Data sets, it makes sense to adhere to a Semantic Web context for the description of the datasets as well. Therefore, we will use W3C recommended vocabularies such as DCAT to provide metadata about each dataset. The metadata that is currently associated with the datasets includes: Title URL Description External Description Page 7

Tags License Organization Visibility Source Version Contact Contact Email Applicable Benchmark Currently, this metadata is stored in the CKAN instance s database. However, the plan is to convert this information to DCAT and make it available for querying once the benchmarking platform is running. Figure 3. DCAT ontology overview (source: https://www.w3.org/tr/vocab-dcat/) 2.4 Data Sharing Industrial companies are normally unwilling to make their internal data available for competitions because this could reduce the competitiveness of these companies significantly. However, HOBBIT aims to pursue a policy of making data open, as much as possible. Therefore, a number of mechanisms are put in place. As per the original proposal, HOBBIT will deploy a standard data management plan that includes (1) employing mimicking algorithms that will compute and reproduce variables that characterize the structure of company-data, (2) feeding these characteristics into generators that will be able to generate data similar to real company data without having to make the real company data Page 8

available to the public. The mimicking algorithms will be implemented in such a way that can be used within companies and simply return parameters that can be used to feed the generators. This preserves Intellectual Property Rights (IPR) and will circumvent the hurdle of making real industrial data public by allow configuring deterministic synthetic data generators so as to compute data streams that display the same variables as industry data while being fully open and available for evaluation without restrictions. Since we will provide a mimicked version of the original dataset in our benchmarks, open access will be the default behaviour. However, on a case-by-case basis, datasets might be protected (i.e., visible only to specific user groups) on request of the data owner, and in agreement with the HOBBIT platform administrators. 2.5 Archiving and Preservation (Including Storage and Backup) HOBBIT will also support the functionality of accessing and querying past versions of an evolving dataset, where all different benchmark versions will be publically available as dump file as well as from the project SPARQL endpoint. The data will be stored on the benchmarking platform server(s), at least for the duration of the project. After the project, this responsibility is transferred to the HOBBIT Association, who will be tasked with the long term preservation of the datasets. Page 9