Every Bit Counts Publication and Citation of Data in the Earth Sciences MG&G Data Systems Advisory Committee Meeting 2009 Jens Klump et al.
Autors Jens Klump 1, Robert Huber 2, Jan Brase 3, Michael Diepenbroek 2, Hannes Grobe 4, Beate Hildenbrand 5, Heinke Höck 6, Michael Lautenschlager 6, Uwe Schindler 2, Irina Sens 3 and Joachim Wächter 1 1. ( WDC-TERRA GFZ Potsdam (proposed 2. WDC-MARE, Univ. Bremen 3. ( Germany TIB Hannover (Nat. Lib. Sci. et Tech. 4. WDC-MARE, AWI Bremerhaven 5. WDC-RSAT, DLR-DFD Oberpfaffenhofen 6. WDC-Climate, MPI-MET Hamburg
Data driven research has become an important part of science. Scientific communication still emulates paperbased media. Most data remain inaccessible and are at risk of being lost. Why Data Publication?
Data publication today
Use of Published Data No citation of the data source. The data source needs to be deduced from the paper. No Metadata. Often, the source of data is not acknowledged.
Data in the publication process today Library Publication Private Files Manuscript Data Metadata ( 2003 ) al. After Helly et
The consequences Most data remain underutilised because they are not accessible. Unnecessary duplication Research results cannot be verified. Falsification of results. Calls to make data accessible and share data were welcomed but did not give any results.
Why are data not made accessible? Data publication is hampered by structural barriers in the publication process: Journals do not devote space to data tables due to economic constraints and have no interest in archiving data. Authors do not receive professional recognition for publishing data because the datasets cannot be cited in a reliable way. Data are not cited because their location (URL), in many cases, is transient.
Necessary steps Data need to be citeable to be valuable. Reputation is the currency of science. Authors will only prepare data for publication if the effort is worthwhile. Data publication is labour intensive. Data must be accessible. Access through persistent indentifiers and long-term archives. Intellectual property rights need to be secured. Authors need full control of their publications.
Project Publication and Citation of Scientific Primary Data Funded by the German Science Foundation. Project partners: ( Bremen/Bremerhaven ) WDC-MARE ( Hamburg ) WDC Climate ( WDC-TERRA GFZ Potsdam (proposed ( Oberpfaffenhofen ) WDC-RSAT Implementation of services for the publication of data. DOI registration agency at German National Library for Science and Technology (TIB Hannover). To date 18 DOI registration agents. Inclusion of data publications into library catalogues.
STD-DOI System Architecture
Digital Preservation & Trust Creation of digital information continues to accelerate! Practical digital preservation/curation efforts are just starting. Who can guarantee the long-term availability, authenticity and integrity of digital information? Who is trustworthy? Which institutions, approaches and technologies can be trusted? Evaluation and audit methods have been developed and are now in an ISO standardisation process.
Data Publication as a Supplement to Literature TIBORDER catalogue of the German National Library of Science and Technology. doi:10.1594/gfz.sddb.1043 at the ICDP Scientific Drilling Database.
Data Publication as Independent Work
DOI metadata The STD-DOI metadata are mainly Dublin Core elements, plus data specific elements. The metadata transmitted to the National Library via web service (HTTP/SOAP) and incorporated into the library catalogue. The metadata may contain references to other objects. Element <RelatedIdentifier> iscited, isparent, ischild, isduplicate,
Links to other sources The element <relatedidentifier> can be used to point to other electronic objects: Point to the literature where the data set is interpreted. Point to samples, from which the data were derived. Point to other datasets that belong to the same collection of datasets. These links can be used by machines (e.g. data portals) to make search suggestions and thus aid discovery of data, literature and samples, or other added value services.
Information Discovery Link to publication Citation of data IGSN points to sample
Technical Questions Granularity What size dataset should be included in the catalogue? Which child elements of a collection of data objects should be identifiable through DOI? Quality Control ( syntactic ) Technical QC ( semantic ) Peer-review Persistence Data must be available forever.
How can I participate? Point of contact: TIB Hannover Roles: Data Creator -> Author Data Centre -> Data Publication Agent Library -> Publication Agent for Grey Literature STD-DOI Service is part of the TIB Hannover infrastructure. The STD-DOI consortium is open to new members.
Summary Data publication is a good idea, but it needs Persistent Identifiers Long-term archives Incentives to authors to publish their data Data publications can, and should be, cited. Links pointing to literature, samples, and related data can be used to aid discovery. The challenge is to integrate this into our scientific culture the technology is there. You are welcome to participate!