Quality Assured (QA) data

Similar documents
Implementation of the information system BExIS 2 at the UFZ: Quality control, retrieval and sharing of [biodiversity] data

TERENO Workshop. Management and publishing of TERENO data from distributed data bases

Dagmar Triebel, Peter Grobe, Anton Güntsch, Gregor Hagedorn, Joachim Holstein, Carola Söhngen, Claus Weiland, Tanja Weibulat.

re3data.org - Making research data repositories visible and discoverable

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Developing a Research Data Policy

BExIS++ Forschungsdatenmanagement

Specific requirements on the da ra metadata schema

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

For Attribution: Developing Data Attribution and Citation Practices and Standards

A Data Management Plan Template for Ecological Restoration and Monitoring

RADAR Project. Data Archival and Publication as a Service. Matthias Razum FIZ Karlsruhe RESEARCH DATA REPOSITORIUM. Zurich, December 15, 2014

GEOSS Data Management Principles: Importance and Implementation

Data Management Plans Are they working for the Australian Antarctic program? Dave Connell Australian Antarctic Data Centre

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Reproducibility and FAIR Data in the Earth and Space Sciences

Illinois Data Bank Metadata Documentation

Towards a joint service catalogue for e-infrastructure services

ZB MED Information Center Life Sciences

CMIP6 Data Citation and Long- Term Archival

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons.

Scientific Data Management for the ATP 3. Edward J. Wolfrum, Eric Knoshaug, Lieve Laurens, Valerie Harmon, John A. McGowen

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

LifeWatch/EnvEurope User Forum Use Case Ecology

RADAR A Repository for Long Tail Data

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

A Data Citation Roadmap for Scholarly Data Repositories

Open data and scientific reproducibility

MACHINE ACTIONABLE INTEGRATION OF DATACITE AND DDI METADATA

An introduction to data publications

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Technical documentation. D2.4 KPI Specification

User Guide: MSP Portal Enhancements

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Archivierung und Publikation von Forschungsdaten mit RADAR

How to get data into EFG and Metadata Quality

Big Data infrastructure and tools in libraries

Automated Visualization Support for Linked Research Data

WP4: Data Forum. Øystein Godøy, Boris Radosavljević, Boris Biskaborn, Anna Irrgang

Introduction to INEXDA s Metadata Schema

Towards Open Innovation with Open Data Service Platform

Data Quality and Cleaning

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

IM&T Data Management Strategy Overview March 2013 CSIRO INFORMATION MANAGEMENT & TECHNOLOGY

GUID Guide for Data Providers

Plutof Bridge from Data Management to the Reporting (=Publishing) Urmas Kõljalg - Allan Zirk - Kessy Abarenkov University of Tartu

Application of Data Management and Decision Support Tools to Support Coastal Wetland Management in the Laurentian Great Lakes

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

Metadata Management System (MMS)

Callicott, Burton B, Scherer, David, Wesolek, Andrew. Published by Purdue University Press. For additional information about this book

A guide to using the Biocontrol Hub App

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

HUMIT Interactive Data Integration in a Data Lake System for the Life Sciences

: Course : SharePoint 2016 Site Collection and Site Administration

DataSTORRE Deposit Guide

European Network and ICOS

The Role of Data Management in Quality Assurance of Ecological Restoration Data

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

Open Access & Open Data in H2020

SHARING YOUR RESEARCH DATA VIA

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

Provenance Information in the Web of Data

Reducing Consumer Uncertainty

Digital repositories as research infrastructure: a UK perspective

B2FIND and Metadata Quality

Minimal Metadata Standards and MIIDI Reports

Risk Habitat Megacity

Data Curation Profile Movement of Proteins

Facilitate Open Science Training for European Research

JCOMM Observing Programme Support Centre

DOIs for Research Data

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Towards a Collaborative Data Infrastructure

The data explosion. and the need to manage diverse data sources in scientific research. Simon Coles

Demos: DMP Assistant and Dataverse

Introduction to Data Science

Basics in good research data management (RDM) for reviewing DMPs

Dataset Documentation Reference Guide for Pure Users

Linked Data for Data Integration based on SWIMing Guideline: Use Cases in DAREED Project

Animal Sounds - Toward Management and Retrieval Challenges

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Elsevier publishing partner for the journals in Thailand

Data Curation Profile Water Flow and Quality

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

Interoperability in Science Data: Stories from the Trenches

Dutch View on URN:NBN and Related PID Services

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

The GeoPortal Cookbook Tutorial

INTEGRATION OPTIONS FOR PERSISTENT IDENTIFIERS IN OSGEO PROJECT REPOSITORIES

DataONE Cyberinfrastructure. Ma# Jones Dave Vieglais Bruce Wilson

DURAARK. Ex Libris conference April th, 2013 Berlin. Long-term Preservation of 3D Architectural Data

DIAS_Satellite_MODIS_SurfaceReflectance dataset

EUDAT & SeaDataCloud

Enhanced retrieval using semantic technologies:

EUROPEANA METADATA INGESTION , Helsinki, Finland

Transcription:

Quality Assured (QA) data Towards DOI quality of data generated at the UFZ Mark Frenzel (Ecologist) & Thomas Schnicke (IT) DataCite / Helmholtz Open Science Workshop Leipzig, 12.01.2016

QA + DOI: Best practice elements and goals AIM: Re-use of data (open data) Recognition of importance of data management (attitude) o Top down o Bottom up Documentation and selection of important data sets Meta data standard for data description Thesauri (controlled vocabulary) Defined workflows Persistent storage / hardware Magic tools: software solutions

QA + DOI: Best practice elements and goals Data publication: DOI for data sets Linking data to publications and people Feedback on attitude of data providers Smiling data creators Smiling users

The UFZ situation >20 years of data production (exponential growth rates!) Heterogeneity of data (person-generated, device-generated; measurement, observation, modelling, data bases) Documentation (meta data) of data sets by data creators not standardized or missing Long term availability for scientific data not ensured Quality control work flows and tools partly not well-established DOI for datasets initiative started 3 years ago (priority condition set by head of UFZ: quality control of data) Challenge for Science & IT: development of workflows, tools and attitudes!

UFZ approach: IT tools (I) Managing all kinds of data UFZ Data Management Portal (DMP)

UFZ approach: IT tools (II) Exploring all kinds of UFZ data (internal and external access) UFZ Data Research Portal (DRP) Geo-referenced data Keyword- and Mapbased search Data access with configured priviliges Link to DMP (internal) [4] Details page = DOI Landing page Public Meta data UFZ DatenRecherchePortal Details / DOI Landing page

Different data different requirements (I) The case of UFZ logger data Mostly sensors, e.g. weather stations ( Big Data) => TERENO Responsibility taken by IT Department o Database, interfaces o Workflow,... Level 0 = Raw data Level 1 = Transformation to target unit (e.g. V in C) Level 2 = Quality controlled (check + correction missing data, outliers,...)

UFZ quality control of logger data ( DOI) Crucial for use and publication of data! Evaluation of software products o Searching the magic tool Generic, allowing high degree of adaptation to user needs, user-defined algorythms o UFZ-decision for training on ORIGIN (data analysis and graphics software) other software solutions, i.e. Matlab, R and others on demand

Logger data to landing page at the UFZ Logger data DMP Logger component Quality control UFZ Data Research Portal DOI Workflow Data set detail page landing page

Different data different requirements (II) Example biodiversity data (mostly person-generated data) Issues Heterogeneity of data Logic of ecologists related to data (different from IT people) o Ecology: data based on spreadsheets Data base?! Plausibility tests o Expert knowledge o Software (e.g. occurrence of species A at location B plausible?) Technical consistency o Correct data types o Correct cell entries

UFZ quality control of biodiversity data: BExIS BExIS: Biodiversity Exploratories Information System DFG-Project: generic open source information system for biodiversity data (funded until 2017; http://fusion.cs.unijena.de/bexis) Tool for spreadsheet-based biodiversity data (Excel import) Features: Import // Table-to-database // Export // Consistency check // Retrieval // Metadata // Admission rights Test case for UFZ DOI-ready data sets

Workflow biodiversity data UFZ Spreadsheet data Quality Control BExIS BExIS Search (terms; semantic) Quality control UFZ Data Management Portal Archive component UFZ Data Research Portal ( landing page ) DOI

Conclusions (UFZ) Tools for quality control of data are at hand Technical (IT) environment is DOI-ready Promotion within UFZ (scientists) needs to start DOI technical issues to be solved DOI-relevant data Granularity Versioning,... Editorial team? DOI-Agent issue still open (Helmholtz solution?) Finally we should aim at Open data! SEITE 13