An introduction to data publications Kirsten Elger Deutsches GeoForschungsZentrum GFZ, Potsdam, kirsten.elger@gfz-potsdam.de
Research Data Research data are essential for scientific research Many datasets, e.g. observational data, are irreplaceable With the advent of the internet, there is a significant change in the way to collect, manage, and archive research datasets
Observations on: meteorology, geomagnetism, auroral phenomena, ocean currents, tides, structure and motion of ice and atmospheric electricity So extensive and dangerous a work Eleven nations established 14 principal research stations across the Polar Regions. 12 were in the Arctic, along with at least 13 auxiliary stations. Over 700 men incurred the dangers of Arctic service to establish and relieve these stations between 1881 and 1884.
Geological field work in 1995 GPS values
data publication in 1995
and after the end of the project? the bad case: the phd student/ postdoc takes the data with him or her (on a floppy disc/ CD) and, years later, throws everything away Slightly better: data submission (in digital or analogue form) to a computer of the department, with or without data description (depending on the time and motivation of the respective scientist What happens when the professor or lab PI retires? Who takes care of the hard drives with the old data? Who takes care of paper copies of maps or other datasets? How long may rock samples be archived after the scientist left?
Research Data Today Thanks to the internet many datasets are available online very fast data access, even to large datasets online access to journal articles online-only journals are coming of age real-time data
Real-time data example: climate station in Alaska (air, surface, shallow ground temperatures) Quelle: Permafrost Lab, UAF, Fairbanks http://permafrost.gi.alaska.edu/
GEOFON earthquake information service GEOFON Live Seismograms
NOAA (National Ocean and Athmosphere Administration): Synoptic meterological records of the first IPY ín digital form (surface air temp, sea level pressure 1-year time series) extensive documentary image collection Overview on IPY reports Posters Online available for download: www.arctic.noaa.gov/aro/ipy-1
as a consequence With the advent of the digital era and the internet, data sets increasingly grow in size and complexity Data reuse and data mining are becoming more and more important Metadata portals (with automatically generated standardised metadata) are more and more important for data discoverygetting There is an incrasing number of data repositories and for all types of research data There is increasing expectation by the scientific community, funding agencies and the public to make publicly-funded research results and data free and open accessible without any constraints.
Politics 2003 Berliner Erklärung über den offenen Zugang zu wissenschaftlichem Wissen: Open Access- Veröffentlichungen umfassen originäre wissenschaftliche Forschungsergebnisse ebenso wie Ursprungsdaten, Metadaten, Schwerpunktinitiative Digitale Information der Allianz der deutschen Wissenschaftsorganisationen: Die Verfügbarkeit und Nachnutzung digitaler Informationen schließt den möglichst kostenfreien und offenen Zugang zu Forschungsdaten ein.. Digitale Agenda der Bundesregierung 2013-2017
Helmholtz Open Science Open science, the unrestricted access to scientific publications and cultural heritage, is an ongoing and future trend in the scientific landscape worldwide. Research publications and other digital objects such as research data and scientific software will thus be publicly available on the internet. The Helmholtz Association was one of the initial signatories of the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities in 2003. This commitment towards open access was then formally approved by its Assembly of Members (assembly of the directors of the Helmholtz Centres): Publications from the Helmholtz Association shall in future, without exception, be available free of charge, as far as no conflicting agreement with publishers or others exists. (Resolution of the Assembly of Members, 27 September 2004).
Obstacles of sharing too much work with no benefit data publications were deleted from reference lists by journal editors they mis-interpret or mis-use my data someone will publish MY data before me Do I have to share ALL my data? www.aukeherrema.nl
Domains of research data PRIVATE DOMAIN SHARED DOMAIN PERMANENT DOMAIN PUBLIC DOMAIN Think about data sharing from the beginning on!
Intelligent Openness (Royal Soc. London 2012) The practice of science: Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories - and of the experimental and observational data on which they are based - permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science s powerful capacity for self-correction comes from this openness to scrutiny and challenge How to make intelligent openness standard? data must be accessible and readily located Data must be intelligible for those who wish to scrutinize them They must be assessable so that judgments can be made about their reliability and the competence of those who created them They must be usable by others For data to meet these requirements it must be supported by explanatory metadata (data about data) Science as an open enterprise (2012) The Royal Society Science Policy Centre report 02/12 ISBN: 978-0-85403-962-3
There is a need for. Researchers willingness to publish their data Technical solutions to facilitate data availability, access and reuse Recognition and credits for data producers
Data publication with DOI persistent citable with metadata
DataCite and Digital Object Identifiers (DOI) for Data STD DOI "Publikation und Zitierbarkeit von Primärdaten" (DFG Project 2004-2009, Partner: TIB, DKRZ, PANGAEA, DLR, GFZ) DOI for research data DataCite
What is a DOI Digital Object Identifier A unique and permanent identifier for digital objects Signpost to the URL with the dataset and its description = landing page Persistent = long term data access guaranteed by the publisher With metadata
Metadata and Metadata Metadata for data discovery: example DOI landing page title citation description/ abstract download data files Keywords standardised metadata related work spatial coverage
Metadata and Metadata Metadata for data discovery author, title, description, keywords, spatial/temporal domain,... Structural metadata (for reuse): formats, methodology, sources Definition of data labels
Metadata and Metadata metadata for data discovery author, title, description, keywords, spatial/temporal domain,... structural metadata (for reuse) formats, methodology, sources, processing steps, administrative metadata metadata related to the use, management, and encoding processes of digital objects over a period of time Includes technical metadata: versions, checksum, timestamp,
A comprehensive data description is essential for data reuse and should always be available before a DOI registration There are different possibilities for data publication
Examples for data publication 1 data supplements to scientific articles Links to datasets Link to original article with data description
Examples for data publication 2: Data Journals Peer-reviewed articles with the description of datasets or collections, etc.
3. Data Reports GFZ examples Institutional Report Series have long traditions as important sources of information. Today: persistently online accessible and citable with DOI GFZ: Data Reports Flexible format enhanced data description standardised templates for each discipline, internal review Project-specific design if required
Coalition on Publishing Data in the Earth and Space Sciences GOAL OPEN DATA in the EARTH and SPACE SCIENCES STATEMENT OF COMMITMENT To promote metadata information and domain standards, [ ], to help simplify and standardize deposition and reuse. To promote referencing of data sets using the Joint Declaration of Data Citation Principles, in which citations of data sets should be included within reference lists. To include in research papers concise statements indicating where data reside and clarifying availability. To promote and implement links to data sets in publications and corresponding links to journals in data facilities via persistent identifiers. (January 2015)
SIGNATURES (Nov 2015) additional signatures welcome
Conclusions Data are increasingly recognized as part of the scholarly record, data citation is coming of age. Data publications with assigned DOI provide citable and persistent access to research data. There is a growing number of data repositories to store and access data (institutional, domain specific, general). Data description is essential for reuse
Next step International Geo Sample Number IGSN unique identifier for physical objects