Data Management Plans Are They Working for the Australian Antarctic Program? Dave Connell Data Management Plans Are they working for the Australian Antarctic program? Dave Connell Australian Antarctic Data Centre 1
The Australian Antarctic Division Established 1947 Coordinates Australian scientific involvement in the Antarctic, sub- Antarctic and Southern Ocean Has responsibility for management of the Australian Antarctic program (AAp) Image - https://goo.gl/images/d04dlu The Australian Antarctic Division Maintains three Antarctic stations, a sub- Antarctic station, an intercontinental air system, and a shipping system Conducts broad-themed, multi-disciplinary science Approximately 60 science projects running each season Image Australian Antarctic Division 2
The Australian Antarctic Data Centre Established 1995 Has responsibility for the data management of the Australian Antarctic program (AAp) Fulfils Australia's obligations under Article (III).(1).(c) of the Antarctic Treaty Scientific observations and results from Antarctica shall be exchanged and made freely available. Image - https://goo.gl/images/ex1606 The AAp Project Cycle Project application Project approval Data Management Plan Conduct scientific research Catalogue, archive and publish data (DOIs) Write papers Underpinned by the AAp Data Policy - http://data.aad.gov.au/aadc/about/data_policy.cfm Image Australian Antarctic Division 3
The AAp Project Cycle Review process Scientists are scored on their data management practices Can have an impact on approval of future projects Image - https://goo.gl/images/col1cg MyScience Application DMPs part of the MyScience application MyScience/DMPs introduced to the AAp in 2012 Designed for scientists and AADC staff to keep track of the data management progress of each project Links to the metadata catalogue Can instantly see what data have been archived with each project Can instantly see the data status of each project (e.g. in progress, complete) 4
Why We Implemented DMPs Designed to cover a knowledge gap in data management Previously, AADC had no idea what data to expect from each project Was one excel spreadsheet really the whole output from a ten year project? Unnecessary pestering of scientists (often not the right scientists) When will the data be ready? Why We Implemented DMPs DMPs answered those questions for us We knew what data to expect We knew when to expect the data We knew who to expect the data from We knew how much storage space we would need (estimate) We knew when to stop asking for the data 5
Data Management Plans Version 1 Data collections Data products Where would data be archived Data sharing networks (e.g. GBIF, OBIS, SOOS, etc.) Physical samples Mapping, GIS, data assistance (also ask about DOIs, but that has largely become irrelevant) Reviewed by AADC staff Data Management Plans Version 1 Data collection was the problem version 1 used individual parameter vocabularies (externally sourced and user driven) Mean air temperature at Casey Station automatic weather station Mean air temperature at Casey Station direct observation Mean air pressure at Casey Station automatic weather station Mean air pressure at Casey Station direct observation etc. 6
Data Management Plans Version 1 Intention was to use parameters to form the basis of a powerful search tool (eventually) But Not logical to scientists Lot of back and forth between AADC staff and scientists Very labour intensive and time consuming At the data archival end, often difficult to match up Image - https://goo.gl/images/ikhbtt Data Management Plans Version 2 Focus brought back to datasets, not parameters (everything else the same) Meteorological data from Casey Station including air temperature, pressure, wind speed (etc.) Far simpler to complete 7
Data Management Plans Version 2 Response Anecdotal evidence from scientists was that it was far easier, less onerous AADC staff perspective much quicker, less back and forth Likely to be easier to match up when data archival takes place Image - https://goo.gl/images/azdgkq Data Management Plans What s Next Still need some kinks ironed out Can t tick off datasets as they come in Better presentation (lot of text) Need better reporting tools Image - https://goo.gl/images/ekhgkc 8
Data Management Plans What s Next Require greater integration with data submission tools Data submission tool should know what datasets a scientist is likely to submit May reintroduce vocabs at the data submission stage Data Management Plans General Thoughts and Conclusions DMPs have been a definite success from a data management perspective Allows AADC to report back to the funding office on how each project is tracking Scientists probably see it as another form hence simple is probably better Image - https://goo.gl/images/olxwx0 9
Data Management Plans General Thoughts and Conclusions Estimates of data archival usually unrealistic (sometimes size estimates too) Some DMPs not thoroughly completed, or not taken seriously Small fraction of data not archived at AADC Policy support Simplified version likely to be much better to use from go to woe (currently still go-ing ) 10