The Metadata Challenge:

Similar documents
Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Software Requirements Specification for the Names project prototype

Registry Interchange Format: Collections and Services (RIF-CS) explained

RDDS: Metadata Development

JISC WORK PACKAGE: (Project Plan Appendix B, Version 2 )

Practical experiences with Research Quality Exercises DAG ITWG - August 18, David Groenewegen ARROW Project Manager

Phase 1 RDRDS Metadata

The OAIS Reference Model: current implementations

For Attribution: Developing Data Attribution and Citation Practices and Standards

Functional Requirements - DigiRepWiki

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Infrastructure for the UK

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Long-term digital preservation of UNSWorks

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

ScienceDirect. Multi-interoperable CRIS repository. Ivanović Dragan a *, Ivanović Lidija b, Dimić Surla Bojana c CRIS

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

Towards repository interoperability

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

From Open Data to Data- Intensive Science through CERIF

Jisc Research Data Shared Service

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Laying the Foundations for Repository Preservation Services

BPMN Processes for machine-actionable DMPs

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Update on the TDL Metadata Working Group s activities for

Digitisation Standards

Content Management for the Defense Intelligence Enterprise

Data is the new Oil (Ann Winblad)

Facilitate Open Science Training for European Research

Showing it all a new interface for finding all Norwegian research output

Comparing Open Source Digital Library Software

is an electronic document that is both user friendly and library friendly

OAI-ORE. A non-technical introduction to: (

An Introduction to PREMIS. Jenn Riley Metadata Librarian IU Digital Library Program

Transferring vital e-records to a trusted digital repository in Catalan public universities (the iarxiu platform)

If you build it, will they come? Issues in Institutional Repository Implementation, Promotion and Maintenance

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Dictionary Driven Exchange Content Assembly Blueprints

Metadata Workshop 3 March 2006 Part 1

Digital repositories as research infrastructure: a UK perspective

Show me the data. The pilot UK Research Data Registry. 26 February 2014

Research Data Repository Interoperability Primer

Collection Policy. Policy Number: PP1 April 2015

A Dublin Core Application Profile for Scholarly Works (eprints)

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

Online Catalogue and Repository Interoperability Study (OCRIS)

Jisc Research Data Shared Service. Looking at the past, looking to the future

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

How to contribute information to AGRIS

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

SNHU Academic Archive Policies

Non-text theses as an integrated part of the University Repository

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Opus: University of Bath Online Publication Store

Persistent identifiers, long-term access and the DiVA preservation strategy

Adding OAI ORE Support to Repository Platforms

Current JISC initiatives for Repositories

Open Access compliance:

Increasing access to OA material through metadata aggregation

3. Technical and administrative metadata standards. Metadata Standards and Applications

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

Open Access, RIMS and persistent identifiers

Global ebusiness Interoperability Test Beds (GITB) Test Registry and Repository User Guide

Workshop B: Application Profiles Canadian Metadata Forum September 28, 2005

SobekCM. Compiled for presentation to the Digital Library Working Group School of Oriental and African Studies

Research Data Management and Institutional Repositories

A service-oriented national e-theses information system and repository

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

B2SAFE metadata management

DataFlow and VIDaaS Workshop

ERMS Folder Development and Access Process

Greenstone Publications

Preserving repository content: practical steps for repository managers

The National Bibliographic Knowledgebase

The European Repositories Landscape - The view from 20,000 feet

RDM through a UK lens - New Roles for Librarians?

Introduction SCONE and IRIScotland Scaling General retrieval Interoperability between SCONE and IRIScotland Context

Helping Journals to Upgrade Data Publications for Reusable Research

CORE: Improving access and enabling re-use of open access content using aggregations

The Oxford DMPonline Project

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Metadata: The Theory Behind the Practice

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

A Standards-Based Registry/Repository Using UK MOD Requirements as a Basis. Version 0.3 (draft) Paul Spencer and others

SciENCV - Putting the Pieces Together VIVO

OpenAIRE Guidelines for Data Archive Managers 1.0 December 2012

RN Workshop Series on Innovations in Scholarly Communication: plementing the Benefits of OAI (OAI3)

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Towards a joint service catalogue for e-infrastructure services

The Case for Interoperability for Open Access Repositories

Managing Image Metadata

Metadata Standards and Applications

James Hardiman Library. Digital Scholarship Enablement Strategy

Libraries, Repositories and Metadata. Lara Whitelaw, Metadata Development Manager, The Open University. 22 Oct 2003

Data Exchange and Conversion Utilities and Tools (DExT)

Illinois Data Bank Metadata Documentation

Transcription:

The Metadata Challenge: Determining local and global needs and expectations for your metadata Gareth Knight, Kultivate Metadata workshop 24 th May 2011 Centre for e-research (CeRch), King s College London

Presentation structure 1. Role of metadata in a digital repository 1. How do you determine fitness for purpose? 2. User 1: Local researchers 1. What do they wish to achieve? What do they require? 2. Do you currently fit their needs? 3. How can you better address needs? 3. User 2: 3 rd party harvesters 1. What questions may they have? 2. What do they require? Balancing local & global needs 4. User 3: Staff involved with REF preparation 1. What do they wish to achieve? What questions may they have? 2. Approaches to addressing questions work performed in Readiness 4 Ref project 5. Conclusions 2

Metadata and its evolving role metadata is developed by people for a purpose or a function Coyle, Karen, 'Understanding Metadata and its Purpose', Journal of Academic Librarianship, v. 31, n. 2, pp 160-163 The traditional library The digital library data metadata and in the wider world 3

Is your metadata fit for purpose? 1. Who and where are the stakeholders? Who are the audience who will use your metadata? People: staff, students, researchers, etc. Machine: Repository registries 2. What do the users wish to achieve when using the metadata? What function does it need to perform? Do they have specific questions that they wish to address. 3. How will your metadata fulfil these functions? Implications for quality assurance: Is it good enough for the intended purpose? In what format do they expect a response to be provided? 4. If do not currently meet their needs, how do you intend to provide them with the information they require? Metadata creation - implications of time / resources Machine vs. user creation 4

What are the stakeholders of a Creative Arts repository? Researchers of different types at different levels Artists, Performers, Undergraduate and postgraduate students Staff preparing for REF submission, Funding agencies who wish to access research 3 rd party subject repositories and value-added services that wish to obtain and use your metadata and others 5

User 1: What do researchers expect from your metadata? Discovery What resources exist on topic? What resources were created by this person? Identification What is it? Who created it? What does it represent? What does it contain? When was the physical original created? When was digital edition created? Use: Can I use it in my own work? How do I cite it? Is your metadata fit for purpose? Does it answer researcher questions? 6

Adolphe Appia collection metadata 7? Original? Possibly. Contact details?? Q1. What is it? Q2. Who created it? Q3. What does it contain? Q4. When was the physical original created? Q5. When was digital edition created? Q6. Can I use it in my own work? Q7. How do I cite it? Answers some, but not all questions Q8. What other resources exist on topic? Q9. What other resources were created by the author?

Implications Metadata answers some, but not all questions Questions: 1. How can we make better use of existing metadata? Perform NLP to extract information from large body of text Reformat data elements into consistent format, e.g. creator names Map depositor free text md to structured subject terms 2. What additional metadata is required to fit evolving needs? How will metadata be extended and enhanced? Do you have the resources to enhance catalogue metadata 8

User 2. What do 3 rd parties expect? A 3 rd party wishing to harvest your MD may have several questions: 1. How do I identify the MD records that repository X has published? OAI-PMH, RSS, OAI-ORE 2. Can I obtain all metadata for an item? 3. Are we allowed to reuse and republish the records (rights)? 4. Is it possible to combine MD records from multiple repositories to provide value added features? Must determine exposed formats OAI-PMH Anyone have a machineprocessable MD policy? Reuse of MD can prove to be a challenge 9

Metadata 101 Metadata is composed of three components: 1. Semantics 2. Content rules 3. Syntax The meanings of the What values go into the metadata elements elements How they are encoded William Shakespeare <dc:contributor>contributor Title Mandato Bill (dc Shakespeare element)</dc:contributor><dc:coverage>coverage (dc ry The name given to the resource. element)</dc:coverage><dc:creator>creator (dc element)</dc:creator><dc:date>date William Geoffrey DC (dc element)</dc:date><dc:description>description (dc Creator Mandato element)</dc:description><dc:format>format Shakespeare ry The topic of (dc the content of the resource element)</dc:format><dc:identifier>http://dublincore.org/</dc:identifier><dc:language>l Shakespeare, William VRA anguage (dc element)</dc:language><dc:publisher>publisher Description Optional Shakespeare, An account William (dc element)</dc:publisher><dc:relation>relation (dc of the content of the resource element)</dc:relation><dc:rights>rights Geoffrey (dc element)</dc:rights><dc:source>source (dc element)</dc:source><dc:subject>subject Shakespeare, Bill (dc element)</dc:subject><dc:title>title Metadata standards address resolve these issues, (dc element)</dc:title><dc:type>type Shakespeare, (dc B. element)</dc:type> but only Shakespeare, if used W. correctly Shakespeare, W. G MODS

Interoperability challenge: Balancing local & global needs (1) Scenario: Repository have local requirements and wish to catalogue information not found in existing schema (e.g. death date). Shoehorn information into existing element, Claim that local implementation conforms to standard Potential implications: Worse case scenario - MD originating from multiple sources and combined into single scheme will contain information that has different semantic meaning and content rules user may be misled by information Each repository has different control rules, resulting in variation in vocabulary restrictions, differences in handling of names, subject terms, rights AHDS, SHERPA DP preservation repository examples. Some elements may be omitted when converting between metadata formats using standard crosswalks (e.g. http://www.getty.edu/research/conducting_research/standards/intrometadata/crosswalks.html) 11

Interoperability challenge: Balancing local & global needs (2) Possible Solutions: Application profile Declare implementation is an (DC, VRA, CDWA, MODS, etc.) application profile with some local elements Packaging format, incorporating multiple schemas Utilise MD packaging format (METS, MPEG21) and combine two schemes 12

New and emerging purpose/function: Research Excellence Framework Research support staff preparing information for REF submission may be considered a new type of stakeholder that must be considered when creating & enhancing metadata 13

Potential approaches 1. Repository plus in-house research database (KCL approach) 2. Repository and CRIS in concert 3. Standalone repository 14

Research Information systems at King s HRMS Name, job title, start date, etc Feed automatic, overnight RG Academic Interface Manual entry: Qualifications Esteem Factors Publications not in WoK feed Feedback workflow Web interfaces RG Admin Interface Manual entry: Qualifications Esteem Factors Publications not in WoK feed Feedback workflow School/College Managers Research Strategy SITS PG(R) details Feed automatic, overnight APTOS Grant monthly expenditure Feed automatic, overnight Research Gateway Database School web pages (potential) Experts database (potential) Staff Profiles (Web pages) Res. Grants & Contracts Awards, amounts, dates Feed manual, 2 weeks Direct link RAE/REF outputs IoP Res. Grants Awards, amounts, dates Feed manual, 1 month AKORD Repository Full text of publications, publication metadata AKORD publications database (web interface) Web of Knowledge Publications in journals Feed automatic, 1 week InCites -bibliometric information 15 15

Why use a digital repository to create metadata for REF? Benefits: Established infrastructure - core part of institutions, trusted publisher Accustomed to handling information flows OAI-PMH, SWORD, RSS Sustainable over time not just one-off submissions to REF Challenges: Many institutions do not have all information required for REF Repository metadata insufficient for REF requirements Many repository data models are too simple for REF REF entities require more than just a data element in an eprints object record VRA, CDWA, MODS, etc. do not support element set required by REF 16

User 3. What do staff involved with REF preparation expect? 1. Where do I locate information on research structures, staff, output and other entities? 2. Can I identify and obtain all metadata of relevance from the repository (in order to analyse and merge it at a later date)? (Similar to user 2) 3. Is metadata provided by digital repository (and other systems) of sufficient quality and completeness for REF submission? 4. Is it possible to relate research artefacts referenced in IR to metadata stored elsewhere? 5. Can I ensure that REF metadata is stored and exported in manner consistent with other UK research institutions? 17

Readiness 4 Ref Readiness 4 Ref (R4R) funded to address some of these questions. 1. Survey existing university systems 2. Analyse REF requirements 3. Apply CERIF to REF CERIF provides many of the descriptive elements for capturing REF requirements. Devise CERIF4REF XML application profile 4. Case studies in HEIs, funders, publishers How easy is it for institutions to provide their data using the CERIF4REF format? 5. Developed CERIF4REF plug-ins for eprints (Southampton), DSpace (Edinburgh) & Fedora (King s) Generate xml in CERIF4REF format to allow import into other repositories or CRISs 6. Demonstrator IRs at King s and Southampton Can we assemble, supply and exchange such data? 18

CERIF and CERIF4REF schemas Common European Research Information Format (CERIF) dev. in 1980s & maintained by eurocris, a not-for-profit organisation Intended to enable data exchange between research information systems Available in SQL and XML versions CERIF4REF application profile Simplified version of CERIF tailored for REF Less elements than full CERIF scheme Cardinality changed to reflect REF requirements Guidance on information to provide in each field (e.g. controlled vocabulary, suggested terms) XML scheme Single XML schema containing all elements 19 19

Mapping data to CERIF4REF Worked with 5 institutions to trial CERIF4REF Goldsmiths, Kingston, Reading, Leicester, University of the Arts, Ulster Asked to map their currently-held data to the CERIF4REF data dictionary Data obtained from: Staff HR, finance, student information, research database, publication system (usually eprints), pre-award db, grant db, and 3 rd parties (usually web of science) 20

Mapping data to CERIF4REF: Experience & Challenges (1) General concerns: Concerns regarding the use of EPrints to store sensitive data Quality of data: Some data elements require cleansing or re-keying prior to export to C4R Citation style for author/contributor in repository was inconsistent with requirements Some information held by institution was incomplete, e.g. sponsor data & funding source in student record db Manual intervention was required to map some repository values to different C4r fields depending on output type May not be one-to-one mapping of repository to REF output types 21

Mapping data to CERIF4REF: Experience & Challenges (2) Interoperability: Very little interoperability exists between corporate systems Inconsistency of metadata labelling between systems different IDs used for staff in some systems Not all data in a correct or transferable format, so must be restructured 22

Further changes to EPrints University of Southampton has performed additional work to enhance data model: Pre-CERIF EPrints Traditionally, EPrints has possessed a flat structure - EPrints object Metadata & 1+ optional files. Project and funding information held as MD elements in schema CERIFied EPrints Repository Objects Project and Funding Organisation are objects in own right Able to create relationships between an EPrint object and its parent Project / Funder 23

Conclusions General recommendations: Consider the end goal: Who: Who currently uses your metadata? What audience is being to emerge? What: What functions will the metadata achieve? How will it help your audience? How: What can you achieve using resources available? What are the implications for how you capture metadata? CERIF4REF and the Creative Arts: Determine the research outputs that will be included in your REF submission Review the REF requirements and determine information sources from which metadata may be obtained If planning to use digital repository as a basis for your REF submission, visit the R4R web site to find the CERIF4REF data model & schema, EPrints/DSpace/Fedora plug-ins, and case studies http://r4r.cerch.kcl.ac.uk/ 24

Thank You for your attention QUESTIONS? Gareth Knight gareth.knight@kcl.ac.uk 25