B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

Similar documents
EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

B2FIND and Metadata Quality

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT Towards a Collaborative Data Infrastructure

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Building metadata components

Registry Interchange Format: Collections and Services (RIF-CS) explained

Metadata and DCR. <CMD_Component /> Dieter Van Uytvanck. Max Planck Institute for Psycholinguistics

The Virtual Language Observatory!

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

The EUDAT Collaborative Data Infrastructure

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

B2SAFE metadata management

Working towards a Metadata Federation of CLARIN and DARIAH-DE

Data management and discovery

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

Curation module in action - its preliminary findings on VLO metadata quality

EUDAT. Towards a pan-european Collaborative Data Infrastructure

The OpenAIREplus Project

Component Metadata Infrastructure Best Practices for CLARIN

A distributed network of digital heritage information

The Sunshine State Digital Network

Specific requirements on the da ra metadata schema

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

ACDH AUSTRIAN CENTRE FOR DIGITAL HUMANITIES

EUDAT - Open Data Services for Research

(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014

The DART-Europe E-theses Portal

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions

escidoc-based Virtual Research Environments Matthias Razum Frank Schwichtenberg

1. General requirements

DOIs for Research Data

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

From Open Data to Data- Intensive Science through CERIF

The DL.org Quality Working Group

Title Version Author(s) Date Status Distribution 1 Introduction 2 OpenSKOS community 2.1 OpenSKOS user stories

EUDAT Common data infrastructure

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

The Design of a DLS for the Management of Very Large Collections of Archival Objects

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

Comparing Curricula for Digital Library. Digital Curation Education

Integrating Access to Digital Content

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Nuno Freire National Library of Portugal Lisbon, Portugal

Reducing Consumer Uncertainty

Building for the Future

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Multi-disciplinary Interoperability: the EuroGEOSS Operating Capacities

Metadata and Encoding Standards for Digital Initiatives: An Introduction

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Relation between Geospatial information projects related to GBIF

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

AGGREGATIVE DATA INFRASTRUCTURES FOR THE CULTURAL HERITAGE

The Modeling and Simulation Catalog for Discovery, Knowledge, and Reuse

2nd Technical Validation Questionnaire - interim results -

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Managing very large Multimedia Archives and their Integration into Federations

Technical documentation. D2.4 KPI Specification

Metadata Issues in Long-term Management of Data and Metadata

Ontology Servers and Metadata Vocabulary Repositories

New EuroVO registry. architecture and status as of May Menelaus Perdikeas, ESAC Neuropublic.

Data publication and discovery with Globus

Component Registry, Browser and Editor Reference Manual

CMDI and granularity

Improving data quality at Europeana New requirements and methods for better measuring metadata quality

WEB-BASED COLLECTION MANAGEMENT FOR LIBRARIES

Appendix REPOX User Manual

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

BRAHMS v8. An update on development JANUARY 2016

Metadata: The Theory Behind the Practice

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Wendy Thomas Minnesota Population Center NADDI 2014

MINT METADATA INTEROPERABILITY SERVICES

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

Wittenburg, Peter; Gulrajani, Greg; Broeder, Daan; Uneson, Marcus

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Data Discovery - Introduction

National Documentation Centre Open access in Cultural Heritage digital content

Interoperability Patterns in Digital Library Systems Federations. Paolo Manghi, Leonardo Candela, Pasquale Pagano

The geospatial metadata catalogue. FOSS4G Barcelona. Jeroen Ticheler. Founder and chair. Director

Enabling Interaction and Quality in a Distributed Data DRIS

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Transcription:

B2FIND: EUDAT Metadata Service Daan Broeder, et al. EUDAT Metadata Task Force

EUDAT Joint Metadata Domain of Research Data Deliver a service for searching and browsing metadata across communities Appropriate terminology for users of all disciplines when specifying queries possibly adaptive? Access to the data when allowed single auth/autz? Useful visualization of results community provided? Commenting facility to exchange experiences Use existing technologies: OAI-PMH, SOLR/Lucene, etc. Expected challenges Suitable catalog and indexing system for >> 1M records Semantic interoperability problems Granularity issues

Overall plan Import metadata from other EUDAT services: B2SHARE, B2SAFE Look for stable metadata providers from communities EUDAT core communities: ENES, CLARIN, EPOS other interested communities: GBIF, CESSDA, BBMI, other projects aggregating metadata: DataOne, DataCite, Europeana community input: What are useful dimensions for searching & browsing? What are useful metadata collections? Also outreach to emerging communities Help setup a metadata infrastructure, harvest their metadata

EUDAT Metadata Catalog version II Using CKAN as catalog software Open Knowledge Foundation software Choice made after some appraisals: large community, available documentation, proven track record All should be modular & pluggable as much as possible Scalability testing is still in progress 2M records seems ok for searching, but not for importing! EUDAT will still be investigating other catalog technologies Working on adapting CKAN to our needs: Better GUI: accurate temporal search specification, taxonomies,... Priorities: Increase user experience -> metadata quality + Include more communities

B2FIND Architecture WWW CLARIN ENES EUDAT xml community GBIF DataCite OAI- PMH server OAI- OAI- PMH PMH server server OAI- PMH server OAI- PMH server OAI Harvester Mapper mapped data CKAN SOLR/Luce ne PostGreSQL Browsing limited set of facets Keyword search B2SHARE OAI- PMH server rules

B2FIND Faceted Browser Facets: title, author, discipline, organization, publication year, format, language Geospatial search interface Full text search on whole metadata record Current Communities: B2SHARE: EUDAT simple store CLARIN: linguistics ENES: Climatology GBIF: Bio Diversity DataCite: registry for DOI identified data

Faceted browsing Most faceted browsing implementations use SOLR/Lucene Community metadata B2FIND facets Requires translation of information like: <Creator>Tom Mueler</Creator> into facetname=author, value= Tom Mueler

Metadata Quality Problematic quality encoding of values even within one community is not always coherent e.g. even clarin->language No single static mapping will give a good user experience Sparsely filled in records Facets need to be filled or records become invisible e.g. Author in CLARIN metadata is difficult to fill and needs to derive from actor information in different roles Therefore if;then;else constructs are tried

Flexible Mapping in JMD Objectives Extensible None of mapping semantics is hardcoded Editing does not require advanced programming skills Implementation Java based engine Mappings defined by simple XML files Mainly based on XPath expressions Evaluated in a chain: try matching until a non-empty result is achieved

Example Mapping Types Most mappings simply extract an element Empty if element is undefined, so proceed to next Complex join operations e.g. to generate value of author facet, join values of author and originator in the source same person(s) may be listed in both, so remove duplicates Conditional operations For example, used to skip unneeded values like Unspecified in source

B2Find EUDAT Metadata Portal

B2Find communities

B2Find communities

B2FIND Future New communities EPOS is a EUDAT core community, we are waiting on their OAI metadata provider BBMRI (Bioinformatics) Sweden (considering using OAI), still refining their schema other BBMRI members are using other approaches. CESSDA (Social Sciences) probably included in collaboration with DASISH project Commenting function CKAN GUI elements Better more specific temporal search Hierarchical taxonomy based search Ever better mapping rules, but there is a limit!

Thank you for your attention

Next Steps for mapping Improve mapping quality We track coverage (i.e. percentage of all metadata records where a value is mapped for a specific facet) Ranges from around 50% to 100% due to heterogeneity of sources Target over 90% for every facet for every community: No insurance for correctness Add other mapping types Component-based metadata (e.g. CMDI) is not well suited to XPath based mappings Concept registry based mapping type is planned

Who is responsible for metadata quality? In shared research infrastructures this is especially challenging: center -> community infra -> EUDAT infra Community metadata providers are first responsible We get often VERY bad metadata How to improve this? For fast progress no other course than do some curation at service provider (EUDAT) side For proper curation & mapping expertise is needed. Who is interested in doing this? Is there a business model possible to make this work sustainable