Implementation of OpenAIRE Guidelines for CRIS Managers to Finnish VIRTA Publication Information Service

Similar documents
ScienceDirect. Multi-interoperable CRIS repository. Ivanović Dragan a *, Ivanović Lidija b, Dimić Surla Bojana c CRIS

Diaconia University of Applied Sciences. Data protection officer Mari Nyrhinen

Showing it all a new interface for finding all Norwegian research output

Infrastructure for the UK

Increasing access to OA material through metadata aggregation

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

OpenAIRE Guidelines for CRIS Managers

OPENAIRE FP7 POST-GRANT OPEN ACCESS PILOT

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenData Hackathon Δημόσια, Ανοικτά Δεδομένα H εμπειρία του Εθνικού Κέντρου Τεκμηρίωσης

Towards repository interoperability

Jisc Research Data Shared Service

A service-oriented national e-thesis information system and repository

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

European databases and repositories for Social Sciences and Humanities research output exploring comprehensiveness

ScienceDirect. Providing an application-specific interface over a CERIF back-end: challenges and solutions. Dragan Ivanović a, Nikos Houssos b *

Introduction

CORE: Improving access and enabling re-use of open access content using aggregations

Developing Seamless Discovery of Scholarly and Trade Journal Resources Via OAI and RSS Chumbe, Santiago Segundo; MacLeod, Roddy

Alternative Funding Model for [the improvement of] OA Publishing [in Croatia]

Registry Interchange Format: Collections and Services (RIF-CS) explained

OpenAIRE From Pilot to Service

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

The IAC s Publications Archive. Monique Gómez & Jorge A. Pérez Prieto Instituto de Astrofísica de Canarias Tenerife, Spain

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

OpenAIRE Open Knowledge Infrastructure for Europe

INTEROPERABILITY + SEMANTICS = CHECK! Smart and Cost Effective Data Modelling and Tools of the Future

Supporting the H2020 OA Mandate

The Metadata Challenge:

How to contribute information to AGRIS

Nuno Freire National Library of Portugal Lisbon, Portugal

Open Archives Forum - Technical Validation -

Mass Digitisation Enabling Access, Use and Reuse

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

OpenAire and BASE. Services supporting the Interoperability of the European Open Science Network. Lyon,

A service-oriented national e-theses information system and repository

Research Data Repository Interoperability Primer

The National Digital Library Finna Among Digital Research Infrastructures in Finland

Bridging Continents. Kazu Yamaji National Institute of Informatics JAPAN

Survey of Existing Services in the Mathematical Digital Libraries and Repositories in the EuDML Project

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Networking European Digital Repositories

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

The Canadian Information Network for Research in the Social Sciences and Humanities.

COAR Interoperability project and Usage Statistcs

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

BPMN Processes for machine-actionable DMPs

Open Access to Publications in H2020

Interoperability for Digital Libraries

Helping Journals to Upgrade Data Publications for Reusable Research

Questionnaire for effective exchange of metadata current status of publishing houses

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

The Open Archives Initiative in Practice:

EXTENDING OAI-PMH PROTOCOL WITH DYNAMIC SETS DEFINITIONS USING CQL LANGUAGE

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

National Documentation Centre Open access in Cultural Heritage digital content

Implementing a SQL Data Warehouse

Non-text theses as an integrated part of the University Repository

The OpenAIREplus Project

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

Search Interoperability, OAI, and Metadata

Making scholarly statistics count in UK repositories. RSP Statistics Webinar Paul Needham, Cranfield University 26 February 2013

Adding OAI ORE Support to Repository Platforms

Networking European Digital Repositories

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Distributed Services Architecture in dlibra Digital Library Framework

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

ORCID, Researchers & Repositories

OAI-ORE. A non-technical introduction to: (

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

Putting Open Access into Practice

A Dublin Core Application Profile for Scholarly Works (eprints)

Share.TEC Repository System

2nd Technical Validation Questionnaire - interim results -


Using the WorldCat Digital Collection Gateway

LUND UNIVERSITY Open Access Journals dissemination and integration in modern library services

WEB-BASED COLLECTION MANAGEMENT FOR LIBRARIES

Duration: 5 Days. EZY Intellect Pte. Ltd.,

The current state and future perspectives of research information infrastructure in Croatia

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

Networking European Digital Repositories

The GeoPortal Cookbook Tutorial

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

The Scottish Collections Network: landscaping the Scottish common information environment. Gordon Dunsire

Semantic Interoperability of Basic Data in the Italian Public Sector Giorgia Lodi

4th EBIB Conference Internet in libraries Open Access Torun, December 7-8, 2007

Joining the BRICKS Network - A Piece of Cake

OpenAIRE Guidelines for CRIS Managers: Supporting Interoperability of Open Research Information through established standards

MINT METADATA INTEROPERABILITY SERVICES

GN4-1 SA8 Real Time Applications and Multimedia Management. Networks Services People 1

IRUS-UK: Improving understanding of the value and impact of institutional repositories

Making ETDs count in UK repositories. Paul Needham, Cranfield University ETD2014, 24 th July 2014

Webinar: Getting started with CSC's IaaS cloud computing services Pouta

The Sunshine State Digital Network

Forthcoming Deposit Agreement of FSD ARJA KUULA CES2012: DATA ACQUISITION AND LICENSE AGREEMENTS 5TH OCTOBER 2012, TAMPERE, FINLAND

Research repository models: Can one size fit all?

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

Transcription:

Implementation of OpenAIRE Guidelines for CRIS Managers to Finnish VIRTA Publication Information Service Joonas Nikkanen, CSC - IT Center for Science, Finland - https://orcid.org/0000-0002-5036-6444 Dragan Ivanović, University of Novi Sad, Serbia - https://orcid.org/0000-0002-9942-5521 Hanna-Mari Puuska, CSC IT - Center For Science, Finland - https://orcid.org/0000-0001-5532-9274 eurocris Strategic Membership Meeting November 28, 2018, WUT, Warsaw, Poland CSC Finnish research, education, culture and public administration ICT knowledge center

Agenda Background: VIRTA Publication Information Service Context: Finland from OpenAIRE s point of view Case: Implementation of OpenAIRE Guidelines for CRIS Managers Conclusions

VIRTA Publication Information Service

Background: Publication information collection in Finland Ministry of Education and Culture collects bibliographic information on scientific publications annually from 14 universities and 5 university hospital districts (since 2011) 23 universities of applied sciences (since 2012) 12 state research institutes (gradually since 2014) Used as criteria in performance-based funding model of higher education institutions For universities 13 % of core funding (~200 mill. euros) is allocated via publication points Publication points are calculated based on publication types and their level (evaluated by national scholarly panels - Publication Forum - http://www.julkaisufoorumi.fi/en) Publication type Level 3 Level 2 Level 1 Level 0 Peer-reviewed monograph (C1) 16 12 4 0.4 Peer-reviewed article in journal (A1-2) 4 3 1 0.1 Peer-reviewed article in book (A3) 4 3 1 0.1 Peer-reviewed article in proceedings (A4) 4 3 1 0.1 Peer-reviewed edited work (C2) 4 3 1 0.1 Not-peer-reviewed monographs 0.4 Not-peer-reviewed articles 0.1

Background: Use of publication information in Finland Organizations provide a copy of their publication information to VIRTA making it a national data warehouse (or data hub) for other services to use the publication metadata In total some 60 000 publications per year = books, journal articles, conference papers, non-scholarly publications, dissertations, artistic publications All scientific fields are covered The data can be examined on bibliographic level: http://www.juuli.fi/ pivoted on statistical level: www.vipunen.fi/en-gb/ queried via authenticated REST API (XML, JSON) and OAI-PMH API (Dublin Core, XML) 5

Annual publication data National funding model Vipunen Statistical portal for analysis Juuli Portal for examining publications +60k publications annually 14 universities, 5 univ. hosp. Pure / Converis / SoleCRIS Academy of Finland Funding calls and reporting 23 universities of applied sci. Mostly JUSTUS / some manually VIRTA Publication Information System Validating / de-duplicating / REST and OAI-PMH APIs available +350k publications Organisations systems For master data purposes 12 state research institutes Varies from Pure to manual JUSTUS For prefilling co-publications Research Information Hub Wider linking and interoperability OpenAIRE National aggregator 6

VIRTA in short Data sources Data format Data contents Data transfer Updates Temporal coverage Data validation Data use and availability Original metadata in local CRISes (Pure, Converis, SoleCRIS) or other publication databases of HEIs, university hospitals, state research institutes XML files (XML-CSV converter provided for small organizations) The data must include required fields and fulfill certain technical criteria. From organizations via a secure and certified connection by using SFTP protocol and SSH authentication keys. New publications and corrections in local systems can be updated to VIRTA e.g. once a day. The frequency depends on the organizations, minimum being once a year. All data from previous years to present can be transferred. Statistics compiled once a year. Duplicates, faults as well as inter-organizational co-publications identified automatically and real time. Errors informed to research organizations both in an online service and email reports. All metadata is synced once per day and can be examined in JUULI portal: www.juuli.fi. Yearly statistical data is available in Vipunen portal www.vipunen.fi. REST API provides metadata in XML and JSON formats, OAI-PMH API in XML and Dublin Core. 7

Finland from OpenAIRE s point of view

Finnish repositories supporting OpenAIRE harvesting Includes o The repositories from 7 universities (+ 2 as sub-repositories) o Common repository for universities of applied sciences (Theseus) o One research institution (VTT) (+ 4 as sub-repositories) o Self-archived (green OA) publications o Theses Missing o Repositories from 5 universities + most research institutions o Publications not archived in repositories, e.g. o Aaltodoc repository 2011-2017: o 3049 publications o AaltoCRIS 2011-2017: o 35 886 publications o Aalto in VIRTA 2011-2017: o 27 257 publications 9

Survey for Finnish OpenAIRE providers by Finnish NOAD in 2018 The survey was carried out by the Finnish NOAD, the University of Helsinki Responses from 7 (out of 9) current repositories that are harvested by OpenAIRE and 7 non-openaire repositories 6/7 current providers mentioned the work on data models and supporting harvestable API endpoint to be the biggest issues regarding OpenAIRE o In some cases repositories are not connected to CRIS systems o DSpace (used by most repositories) needs work (on publication forms etc.) to be compliant with OpenAIRE specifications o Harvested metadata is rather poor after mapping (repositories include richer metadata) 6/7 of non-openaire harvested repositories are planning to implement OpenAIRE support o Half of them to be harvestable in 2018/2019, others have no schedule yet o 5/7 mention the technical implementation and work on metadata to be too resource intensive to yet have support for OpenAIRE spesifications 10 Summary of results (in Finnish) at: https://blogs.helsinki.fi/openaire2020/2018/08/23/kansallisestakoordinaatiosta-toivotaan-tukea-julkaisuarkistoille-openaire-kyselyn-tulosten-yhteenveto/

Implementation of OpenAIRE Guidelines for CRIS Managers

Why? Centralized solution - to save time and resources oimplementation and possible updates to OpenAIRE specifications has to be done to one system only Better quality and more complete metadata for OpenAIRE orepositories only include a fraction of all publications in Finnish organizations 2018 - Q1 2018 - Q2 2018 - Q3 2018 - Q4 2019 - Q1 Planning Presenting to organizations Mapping data models Procedures for CERIF-XML OAI-PMH work OAI-PMH endpoint validating Permissions from organizations OpenAIRE beta First harvests to OpenAIRE? 12

Guidelines for CRIS Managers The Guidelines provide orientation for CRIS managers to expose their metadata in a way that is compatible with the OpenAIRE infrastructure. By implementing the Guidelines, CRIS managers support the inclusion and therefore the reuse of metadata in their systems within the OpenAIRE infrastructure. OpenAIRE Guidelines for CRIS Managers version 1.1. released in June 2018

Main prerequisites 1. Metadata representation in CERIF XML 2. OAI-PMH endpoint for harvesting 14

What to do? 1. Mapping VIRTA data model to CERIF data model 2. Making procedure for converting data from VIRTA to CERIF- XML and necessary customising 3. Validating that OAI-PMH endpoint returns data as it should based on the Guidelines 4. Agree with organizations on what information they want to be made available for harvest 5. Discuss with OpenAIRE on the details of how and when to do the harvest 15

VIRTA Architecture Virta Publication Information Service Load XML files to SA (SSIS) Validate data (SQL procedure) Find Jufo_IDs (C#) Find co-publications (C#) Find duplicates (C#) Send validation email to organisation Transfer data to DW (SQL Procedure) Create VIRTA-XML (SQL Procedure) Create CERIF-XML (SQL Procedure) Create Dublin Core (SQL Procedure) Update API tables (SQL Procedure) API (REST, OAI-PMH) Powershell script GET GET GET GET GET/POST PUT/DELETE Vipunen statistical portal https://vipunen.fi Juuli portal http://juuli.fi Organisations Systems CRIS, DW, master data etc. JUSTUS https://justus.csc.fi Academy of Finland Funding calls/reporting OpenAIRE https://www.openaire.eu Update reports (SQL Procedure) Data flow via db connection Research information hub https://research.fi ready to do

Data models Many similarities between VIRTA and CERIF data models on publications and what elements are included Key differences: opublication type classification ocase of IDs as national aggregator oopen access classification onational classifications (e.g. field of science) Chosen not to be included in mapping: opublication forum levels (national scholarly panels) oartistic publications 17

Data models VIRTA - CERIF mapping table available: https://wiki.eduuni.fi/pages/viewpage.action?pageid=80941717 VIRTA data model for reference: https://tietomallit.suomi.fi/model/julkaisu/ 18

CERIF-XML SQL procedures used in VIRTA for multiple purposes already New SQL procedure based on the VIRTA-CERIF mapping Run the procedure to populate a database table with CERIF- XML data opopulate by publications that originate from organizations which have granted the permission for OpenAIRE harvest 19

Organizational level Coordination with Finnish OpenAIRE NOAD and discussions with organizations based on the plans and data model mapping Organizations as registrars of data - VIRTA only stores a copy owritten permissions needed if data is allowed to external services / use oorganizations were asked if o 1) OpenAIRE can harvest their data o 2) Which publication years can be included in the harvest o 3) Are there other limitations for the harvest (e.g. publication types) These can be implemented on CERIF-XML procedure as they come and thus exposed via OAI-PMH endpoint 20

OAI-PMH OAI-PMH was already implemented in VIRTA for both Dublin Core and VIRTA-XML metadata oused as basis for implementing OpenAIRE specifications New java implementation for OAI-PMH to support Guidelines ometadata prefix: oai_cerif_openaire oextension of supported sets: openaire_cris_publications omake sure that data is retrievable if above are requested oadd description for Identify request owrite tests ovalidate the implementation by using OpenAIRE CRIS-validator 21

OAI-PMH Tests ran with the local CRIS Guidelines validator against the endpoint Issues found by running it: oelement issues (Type, Access, PublisheIn) oset issues (Events, OrgUnits) oformat issues (Too long ID, wrong order, missing tags etc.) owork in progress 22

23

Conclusion 1. Implementation process highly dependent on source system architecture / technologies 2. Aiming for high quality metadata equals more work and more complex mapping (+ upkeep) 3. Following Guidelines is straightforward, but further support and best practices would be useful for implementers 24

Thank you! 25

Save the date: 21 st to 25 th of October 2019, Poznan, Poland ENRESSH Training school on working with national bibliographic databases. Follow-up for workshop held in Antwerp in September 2018. More information: ecoom@uantwerp.be http://blogs.lse.ac.uk/impactofsocialsciences/2018/1 1/13/towards-more-consistent-transparent-andmulti-purpose-national-bibliographic-databasesfor-research-output/ 26

Joonas Nikkanen Project Manager Research Information Management and Interoperability Tel. +358 50 381 80 92 linkedin.com/in/joonas-nikkanen facebook.com/cscfi twitter.com/cscfi youtube.com/cscfi linkedin.com/company/csc---it-center-for-science github.com/cscfi 27 Kuvat CSC:n arkisto ja Thinkstock