Metadata and DCR. <CMD_Component /> Dieter Van Uytvanck. Max Planck Institute for Psycholinguistics

Similar documents
Building metadata components

The Virtual Language Observatory!

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

CLARIN Concept Registry: The New Semantic Registry

D-SPIN Report R2.2b: The German Resource Landscape and a Portal

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

On the way to Language Resources sharing: principles, challenges, solutions

Component Metadata Infrastructure Best Practices for CLARIN

The CLARIN Metadata (MD) Toolkit

META-SHARE metadata: Overview of the schema & Interoperability with other schemas

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

1. General requirements

Component Registry, Browser and Editor Reference Manual

Some challenges ahead for the Open Language Archives Community

1 Overview chart. PIDs: talk with EPIC PIDs: MoU or advice. Assessment wave 3. VLO overhaul CMDI 1.2

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014

Metadata quality assurance for CLARIN

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Managing very large Multimedia Archives and their Integration into Federations

Key Elements of Global Data Infrastructures

Increasing dataset quality metadata presence: Quality focused metadata editor and catalogue queriables.

Working towards a Metadata Federation of CLARIN and DARIAH-DE

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,

Metadata Tools Supporting Controlled Vocabulary Services

Just for the record, CMDI should be about semantic interoperability

Metadata Infrastructure for Language Resources and Technology

Metadata Proposals for Corpora and Lexica

Wittenburg, Peter; Gulrajani, Greg; Broeder, Daan; Uneson, Marcus

Centres Network Formation

D-SPIN. D-SPIN Report 2.1: Formation of Centres

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Promoting semantic interoperability between public administrations in Europe

DISCERN Libraries User Guide

Curation module in action - its preliminary findings on VLO metadata quality

CMDI 1.2: Improvements in the CLARIN Component Metadata Infrastructure

CMDI and granularity

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Towards a roadmap for standardization in language technology

PIDs for CLARIN. Daan Broeder CLARIN / Max-Planck Institute for Psycholinguistics

Common Language Resources and Technology Infrastructure REVISED WEBSITE

ISOcat tutorial. DCR data model and guidelines

Best Practices for Termbase Design

LEXUS. for creating lexica

An Oral History Annotation Tool for INTER- VIEWs

Registry Interchange Format: Collections and Services (RIF-CS) explained

Open Archives Forum - Technical Validation -

Something will be connected - Semantic mapping from CMDI to Parthenos Entities

A generic approach to manage metadata standards

Ontology Servers and Metadata Vocabulary Repositories

Dutch View on URN:NBN and Related PID Services

Trusted Digital Archives

NARCIS: The Gateway to Dutch Scientific Information

Metadata Workshop 3 March 2006 Part 1

ISLE Metadata Initiative (IMDI) PART 3 A. Vocabulary Taxonomy and Structure

INSPIRE status report

EUDAT. Towards a pan-european Collaborative Data Infrastructure

RDF and Digital Libraries

Annotation by category - ELAN and ISO DCR

Europeana, the prototype EDLfoundation Europeana Network Europeana, vs. 1.0 ThoughtLab Technical requirements

Metadata and Encoding Standards for Digital Initiatives: An Introduction

EUDAT Common data infrastructure

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Standardizing ontologies for the IP traffic measurement

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

saml requesting attributes v1.1 wd01 Working Draft January 2016 Standards Track Draft Copyright OASIS Open All Rights Reserved.

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia framework (MPEG-21) Part 21: Media Contract Ontology

EUDAT- Towards a Global Collaborative Data Infrastructure

ACDH AUSTRIAN CENTRE FOR DIGITAL HUMANITIES

CORLI. a linguistic consortium for corpus, language and interaction

How to contribute information to AGRIS

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Annotation Science From Theory to Practice and Use Introduction A bit of history

Title Version Author(s) Date Status Distribution 1 Introduction 2 OpenSKOS community 2.1 OpenSKOS user stories

Metadata Management System (MMS)

Macbook Pro HostEurope CESNET 100%IT TransIP. DE (commercial) CZ UK Xeon E GHz. vcores Mem (GB)

Interoperability in Aerospace Public Use Case of CRYSTAL project

Validation in the Netherlands and European Location Framework

Functional & Technical Requirements for the THECB Learning Object Repository

Web services and workflow creation

Persistent identifiers, long-term access and the DiVA preservation strategy

Content Management for the Defense Intelligence Enterprise

NIEM. National. Information. Exchange Model. NIEM and Information Exchanges. <Insert Picture Here> Deploy. Requirements. Model Data.

Standards Driven Innovation

LowesLink VFI (Vendor Financial Information)

TTNWW. TST tools voor het Nederlands als Webservices in een Workflow. Marc Kemps-Snijders

Dictionary Driven Exchange Content Assembly Blueprints

When Communities of Interest Collide: Harmonizing Vocabularies Across Operational Areas C. L. Connors, The MITRE Corporation

Background and Context for CLASP. Nancy Ide, Vassar College

INSPIRE overview and possible applications for IED and E-PRTR e- Reporting Alexander Kotsev

Description Cross-domain Task Force Research Design Statement

CEN Workshop Standards Compliant Formats for Fatigue Test Data (FaTeDa) Project Plan

The Bank of Russia Standard FINANCIAL MESSAGES IN THE NPS

Javier NOGUERAS-ISO 1, Manuel A. UREÑA-CÁMARA 2, Javier LACASTA 1, F. Javier ARIZA-LÓPEZ 2

ISO INTERNATIONAL STANDARD. Language resource management Feature structures Part 1: Feature structure representation

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

An Introduction to PREMIS. Jenn Riley Metadata Librarian IU Digital Library Program

Transcription:

<CMD_Component /> Metadata and DCR Dieter Van Uytvanck Max Planck Institute for Psycholinguistics Dieter.VanUytvanck@mpi.nl

Overview Traditional metadata Component metadata Data categories The big picture In practice: Building components Using components

Traditional Metadata project 1 project 2 project 3

Traditional Metadata: problems Lack of flexibility Too many fields...... but not the ones I am looking for! Lack of interoperability My metadata does not work with your infrastructure! Nederland? Netherlands? The Netherlands? Holland? NL?

Context Other Metadata Infrastructures in our domain: IMDI, OLAC/DC, TEI Problems: Inflexible: too many (IMDI) or too few (OLAC) fields Limited interoperability Problematic (unfamiliar) terminology for some subcommunities. etc.

CLARIN Project - CMDI Metadata infrastructure based on a Component Metadata Model Aims Flexibility Researcher should themselves decide what metadata fits their needs Offer ready made metadata components Allow creation of new metadata components needed Interoperability built-in Complete Infrastructure: software for editing, harvesting, exploitation Compatibility with existing frameworks: OLAC, IMDI

Component Metadata project 1 project 2 project 3

Some terminology Element = atomic unit (a field ) e.g. recording date Component = set of elements e.g. Actor Profile = set of components e.g. OLAC profile Schema = technical (formal) grammar describing a profile e.g. olac.xsd Instance = one metadata description e.g. myresource.xml

Metadata components? Metadata Profile ( components à la carte ) XML schema ( grammar ) XSLT <xs:schema>... </xs:schema> XML validator XML editor <CMD>... </CMD> Metadata instance (the real resource description)

Communist Metadata Infrastructure? Are we all forced to use the same components? No! (although re-use is generally a good idea) But how to guarantee interoperability while using different components? Metadata for dummies

Data Categories project 1 project 2 project 3

Data Categories Age Last Name First Name...

The big picture Data Category

Metadata creation flow

CLARIN MD Live-cycle Perform search/browsing on the metadata catalog using the ISO DCR and other concept registries and CLARIN relation registry Create metadata schema from selection of existing components. Allow creation of new components if they have references to ISOcat Search Service Semantic Mapping Relation Registry ISOcat Concept Registry DCMI Concept Registry CLARIN Component Registry Metadata harvesting by OAI protocol Joint Metadata Repository other Concept Registry Metadata Repository Metadata Repository Metadata descriptions created Metadata component profile was selected from metadata component registry

Building a component firstname Actor lastname <CMD_Component name="actor"> <CMD_Element name="firstname" ValueScheme="string /> <CMD_Element name="lastname" ValueScheme="string"/> <CMD_Component name="actorlanguage"> <CMD_Element name="languagecode ValueScheme="string /> <CMD_Element name="languagename ValueScheme="string ConceptLink="http://www.isocat.org/datcat/DC-1766"/> </CMD_Component> </CMD_Component> ActorLanguage languagecode languagename

Using a component Actor [...] <Actor> <firstname>louis</firstname> <lastname>couperus</lastname> <ActorLanguage> <LanguageCode>nld</LanguageCode> <LanguageName>Dutch</LanguageName> </ActorLanguage> </Actor> [...] firstname: Louis lastname: Couperus ActorLanguage languagecode: nld languagename: Dutch

New metadata datcats? (1) Register at isocat.org Send a mail with your name to dieter.vanuytvanck@mpi.nl You will be added to the CLARIN-NL group My Workspace > button create new data category My Workspace > Private > CLARIN-NL > MD proposal > button edit this data category selection My Workspace > Private > + (add this data category to selection) Click on the icon for save the selected data categories

New metadata datcats? (2) After inspection the new datacats can be moved to the Metadata thematic view Then the datcats in the Metadata thematic view can be submitted to the Thematic Domain Group for official approval

Important points Re-use components + profiles See eg CLARIN-NL components: http:///cmd/components/clarin-nl/

Component Registry + ISOcat Lookup function, see demo http://lux16.mpi.nl:8080/ds/componentregistry/#

Metadata workshop May 27, more information soon...

Conclusions Building your own components and profiles is already possible Creating CLARIN metadata descriptions too Both things require some technical (XML) skills This is not the final infrastructure: Format will be supported in the future To be expected: user friendly editor Browser Component registry search engine

Where to get the toolkit? http:///toolkit

Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n 212230

Backup slides