RADAR Project Data Archival and Publication as a Service RESEARCH DATA REPOSITORIUM Matthias Razum FIZ Karlsruhe
The RADAR Project in a Nutshell RADAR = Research Data Repository Goal: Establish a interdisciplinary research data repository Project website: Project duration: September 2013 August 2015; potential extension for one more year Funded by 2
Focus of the Project Archival of research data as a generic service Long tail of research data (not Big Data ) Offerings Basic service: interdisciplinary data archival Extended service: data publication Operational research data management is out of scope 3
RADAR and the Domain Model 1. Private Domain 2. Collaborative Domain 3. Public Domain Archive 4. Dissemination Domain Researchers Workplace Institutional Infrastructure RADAR 2 Offerings: 1. Archival 2. Archival + Publication Portals, Researchers Data Selection Data Documentation Data Typen / Data Formats Business Model Infrastructure Software Metadata Standards Persistent Identifiers Contracts Interfaces Re-use Based on: Treloar, A., Harboe-Ree, C. (2008) Data management and the curation continuum. How the Monash experience is informing repository relationships. VALA2008 14th Biennial Conference, Melbourne and Klump, J. (2009) Managing the Data Continuum. Online: http://oa.helmholtz.de/fileadmin/user_upload/data_continuum/klump.pdf DataCite, Publishers 4
Envisioned Scope of Services /1 Reliable storage space for research data Generic metadata schema Managing license metadata Managing access rights Access may be restricted to the institution providing the data (resp. another authorized party) and service operator 5
Envisioned Scope of Services /2 Regular fixity checks Assign persistent identifiers (e.g., DOI or Handle) on data set or file level Management of storage quotas Bitstream preservation No functional long-term preservation! 6
Target Audience Researchers Archive (and publish) project-based research data Libraries and Research Institutions Institutional data archival Integration with existing institutional portals Cultural Heritage Organizations Long-term preservation of digitized materials Online access to web derivates Publishers Infrastructure for providing access to research data linked to publications 7
STEPS TO DATA PRESERVATION 8
Partners and Roles Business model SW Development FIZ IPB Scientific requirements LMU Operation of data center SCC RADAR TIB Contacts to publishers and learned societies Bitstream Preservation Data publication 9
RADAR Work Packages AP 1: Project Management (TIB/FIZ) AP 2: Requirements Analysis AP 3: Metadata Profiles AP 4: Data Management AP 5: Data Publication AP 6: Business Model and Legal Framework (IPB/LMU) (IPB/LMU, TIB) (FIZ/SCC) (TIB) (FIZ, SCC) AP 7: Evaluation (IPB/LMU) 10
RADAR Architecture Schematic Overview A User Interface API Management Repository Separation of the repository as a metadata store and the business logic from the data center via Storage API Aim: usage of more than one data center Storage API Data Center Storage interface 11
RADAR Architecture Detailed View A 12
A B Two services A Archiving B Data publication 13
SERVICE TYPE A: Archiving/Preservation A Aim: Trustworthy data preservation For whom? Completed research projects Internal resources, not part of a publication Handle Properties: Minimum metadata set (9 parameters) Handle Variable retention period: 5 to 15 years Bitstream preservation for storage period Regular reports on data integrity Access rights for selected groups/users 14
SERVICE TYPE B: Data publication with integrated preservation B Aim: Trustworthy preservation & traceable publication DOI DOI API For whom? Projects: Data basis for scientific papers Independent data publications (e.g. negative data) Digital representations Properties: Expanded metadata set for discipline-specific data DOI Unlimited storage period Regular reports on downstream use to data provider Access management (embargo & publisher services) 15
METADATA SCHEMA Mandatory properties * Identical to properties 1. identifier* Handle, DOI* 2. creator* Persons involved in producing the data 3. title* Study/Data title 4. publisher* Corporate/Institutional or personal name 5. production year Year, in which data was created or refers to 6. subject area Scientific fields appropriate for the resource 7. resource Resource s content (dataset, model, software ) 8. rights* Rights management statement (CC BY ) 9. rightsholder Institution/Person holding rights 16
METADATA SCHEMA Optional properties - for discipline-specific data descriptions 10. additional title Complementary textual information 11. description Further information (abstract ) 12. keyword Keywords describing the subject focus 13. contributor Associated institution/person (funder ) 14. language* Main language used or relevant to resource 15. alternate identifier* Unique string within its domain of issue (local identifier ) 16. related identifier* Identifiers of related resources 17. geo location* Region/Place where resource originated/refers to 18. data source Data origin (instrument, observation, trial ) 19. software type Software used for data production/processing/viewing 20. data processing Specifies further processing (statistics ) 21. related information Further information (database number ) * Identical to properties 17
RADAR Roadmap Software development 1. Middleware infrastructure 2. Archival service 3. Publikation service DSA certification Roll-out to further disciplines Workflows and interfaces to data providers 18
RESEARCH DATA REPOSITORIUM Questions?