META-SHARE: An Open Resource Exchange Infrastructure for Stimulating Research and Innovation

Similar documents
META-SHARE : the open exchange platform Overview-Current State-Towards v3.0

On the way to Language Resources sharing: principles, challenges, solutions

clarin:el an infrastructure for documenting, sharing and processing language data

META-SHARE metadata: Overview of the schema & Interoperability with other schemas

Language Resources. Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel Fax.

Introduction

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Google Search Appliance

1.1 Create a New Survey: Getting Started. To create a new survey, you can use one of two methods: a) Click Author on the navigation bar.

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus

Basic Requirements for Research Infrastructures in Europe

Developing a Research Data Policy

Machine Translation Research in META-NET

QUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ)

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

New CEPIS Mission

Striving for efficiency

Guide & User Instructions

CEF e-invoicing. Presentation to the European Multi- Stakeholder Forum on e-invoicing. DIGIT Directorate-General for Informatics.

How to Take Your Company Global on a Shoestring A low risk path to increasing your international revenues

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus

Open Language Resources & Meta-Resources: a Treasure and a Challenge for Linked Data

ECHA -term User Guide

RDA DEVELOPMENTS OF NOTE P RESENTED TO CC:DA B Y KATHY GLENNAN, ALA REPRESENTATIVE TO THE RSC J ANUARY 21, 2017

Memorandum of Understanding

Making Open Data work for Europe

GSCB-Workshop on Ground Segment Evolution Strategy

EU Terminology: Building text-related & translation-oriented projects for IATE

Filr 3.4 Desktop Application Guide for Mac. June 2018

CEF Automated Translation version 18 June 2018

From Open Data to Data- Intensive Science through CERIF

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

DocuSign Service User Guide. Information Guide

Intel USB 3.0 extensible Host Controller Driver

Building a Digital Repository on a Shoestring Budget

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

GLOBAL NETFLIX PREFERRED VENDOR (NPV) RATE CARD:

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

How can CLARIN archive and curate my resources?

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Microsoft Academic Select Enrollment

Foundation Standards for recordkeeping

Science Europe Consultation on Research Data Management

ESI Packaging Registration Tutorial

Using Joinup as a catalogue for interoperability solutions. March 2014 PwC EU Services

The Madrid System Goods and Services Database. Ernesto Rubio Senior Director Advisor, WIPO Geneva, July 7, 2010

The Research Data Alliance (RDA): building global data connections CC BY-SA 4.0

EUDAT- Towards a Global Collaborative Data Infrastructure

Cisco WebEx Cloud Connected Audio

Impacts of the GDPR in Afnic - Registrar relations: FAQ

INSPIRE status report

Nuno Freire National Library of Portugal Lisbon, Portugal

Localizing Intellicus. Version: 7.3

The Photon and Neutron Data Initiative PaN-data

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

INSPIRE & Environment Data in the EU

Data Management Checklist

Data management and discovery

Implementation of the Data Seal of Approval

Smart Events Cloud Release February 2017

Complete Messaging Solution

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

The DART-Europe E-theses Portal

Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives

Getting Started with BarTender

Research Infrastructures and Horizon 2020

LiveEngage System Requirements and Language Support Document Version: 5.0 February Relevant for LiveEngage Enterprise In-App Messenger SDK v2.

Filr 3.3 Desktop Application Guide for Linux. December 2017

The European Commission s science and knowledge service. Joint Research Centre


H2020 WP Cybersecurity PPP topics

Evolution of INSPIRE interoperability solutions for e-government

PADOR HELP GUIDE FOR CO-APPLICANTS

Implementation of the Data Seal of Approval

Creative Barcode User Guide Upholding the value of creativity, innovation and entrepreneurship

Pascal Gilles H-EOP-GT. Meeting ESA-FFG-Austrian Actors ESRIN, 24 th May 2016

Promoting semantic interoperability between public administrations in Europe

EC Horizon 2020 Pilot on Open Research Data applicants

Complete Messaging Solution

EGI federated e-infrastructure, a building block for the Open Science Commons

Push button sensor 3 Plus - Brief instructions for loading additional display languages Order-No , , 2042 xx, 2043 xx, 2046 xx

Share.TEC System Architecture

Data driven transformation of the public sector Tallinn, Estonia Head of unit 22 September 2016 European Commission

Detailed analysis + Integration plan

The Multilingual Language Library

Interoperability and transparency The European context

Nuance Licensing Opportunities: Power PDF and Dragon

HF Markets SA (Pty) Ltd Protection of Personal Information Policy

Work Package 2.4. (Public) Procurement Expert Group on the security and resilience of communication networks and information systems for Smart Grids

USB Link Adapter. User s Manual

Scientific Data Policy of European X-Ray Free-Electron Laser Facility GmbH

OpenAIRE From Pilot to Service

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

FS-1100 / FS-1300D Product Library

Rolf Engelbrecht, Claudia Hildebrand, Hans Demski

Towards a joint service catalogue for e-infrastructure services

Checklist and guidance for a Data Management Plan, v1.0

Transcription:

META-SHARE: An Open Resource Exchange Infrastructure for Stimulating Research and Innovation Stelios Piperidis Athena RC, Greece spip@ilsp.athena-innovation.gr Solutions for Multilingual Europe Budapest, Hungary, June 27/28 Co-funded by the 7th Framework Programme and the ICT Policy Support Programme of the European Commission through the contracts T4ME, CESAR, METANET4U, META-NORD (grant agreements no. 249119, 271022, 270893, 270899).

Outline META-SHARE mission and principles Architecture META-SHARE metadata model Legal framework META-SHARE towards version 1 http://www.meta-net.eu 2

Introduction No matter what technology or application one intends to build, a substantial, bulky data set together with the associated basic processing tools/services is indispensable (Statistical) machine translation, speech recognition/synthesis, Information extraction and higher level text and media analysis and annotation (e.g. sentiment, persuasion, etc) But...data collection, cleaning, annotation, curation and maintenance is a very costly business As evidence from other domains (e.g. biotechnology, geodata) shows data and tools become valuable through opening and sharing. http://www.meta-net.eu 3

META-SHARE: objectives META-SHARE tries to treat problems and fill in gaps related to visibility, documentation, identification, availability, preservation, interoperability, etc. in the area of language data and (basic language processing) tools It launches a rather long-term endeavour by which language resources are recognised as important assets that can boost research, technology and innovation through wide availability, pooling, openness and sharing http://www.meta-net.eu 4

META-SHARE: what it is META-SHARE is an open, integrated, secure, and interoperable exchange infrastructure for language data and tools for the Human Language Technologies domain A marketplace where language data and tools are documented, uploaded and stored in repositories, catalogued and announced, downloaded, exchanged, discussed, aiming to support a data economy (free and for-a-fee LRs/LTs and services) It brings together several organisations and initiatives http://www.meta-net.eu 6

META-SHARE architecture META-SHARE is implemented as a network of distributed repositories Local (organisation-based), and Non-local (central) repositories Local repos store and maintain the organisation s LRs (data sets and tools) Non-local repos act as storage and documentation facilities for LRs of organisations not wishing to set up their own repository, or donated or orphan LRs, etc. http://www.meta-net.eu 9

META-SHARE architecture (2) Actual LRs and their metadata (MD) reside in the local repositories. Rights of use and related restrictions under the control and responsibility of LR owners and the repository where the LR resides Each repository maintains an inventory (a local inventory) with all MD of their LRs exports MD allows their harvesting. Harvested MD are stored in the META-SHARE central servers, with synchronised inventories at all times Central servers create, host and maintain a central inventory with all MD descriptions of all LRs available in the distributed network. http://www.meta-net.eu 10

Architecture META-SHARE portal Registration authentication - authorisation Search / browse licence download statistics E xt er n al mappings META-SHARE inventory reporting recommenders META-SHARE inventory Billing / payment META-SHARE inventory re p os metadata harvesting Inventory Inventory Inventory Inventory LR repo LR repo LR repo LR repo http://www.meta-net.eu 11

META-SHARE Metadata Model http://www.meta-net.eu 12

Main features 1/2 Descriptions of LRs, encompassing both data (textual, multimodal/multimedia and lexical) and tools/technologies used for their processing related objects (reference documents, actors, activities etc.) structure: "components are used to group together elements and relations (and other components) elements linked to ISOcat Data Categories "profiles" for each LR type are built upon components, incl. components common to all LR types (e.g. identification, contact, distribution etc.) LR type-specific components (e.g. annotation) http://www.meta-net.eu 13

Main features 2/2 distinguishes: minimal schema: minimum set of obligatory elements and relations required for effective LR search, identification and retrieval e.g.: identification info (title, unique identifier), contact details, technical info (language, text encoding, size etc.), distribution info maximal schema all elements and relations required for the description of LRs (i.e. additional set of recommended and optional elements and relations for the whole lifecycle of LR production and usage) e.g.: provenance information, creation details, validation/evaluation information, intended and actual use(s) etc. http://www.meta-net.eu 14

Metadata upper level mandatory recommended http://www.meta-net.eu 15

Content component mandatory condition-based http://www.meta-net.eu 17

Corpus Component mandatory recommended mandatory under conditions http://www.meta-net.eu 18

Annotation Component mandatory recommended optional mandatory under conditions http://www.meta-net.eu 19

META-SHARE legal framework http://www.meta-net.eu 20

Language Resources Sharing Charter Aim: to give a clear signal to Language Resource providers and users, the Language Technology community, market players, policy makers and the public that in the digital world LRs should be shared and further re-used with the minimum possible transaction costs and efforts and under clear and easy to understand rules META-SHARE fully endorses and supports this vision. http://www.meta-net.eu 21

Guidelines 1/2 Infrastructure LRs described with metadata standardized, ideally open to fair harvesting, with web interfaces allowing search and browsing. LRs persistently stored in open and documented format. Standardisation and interoperability Open Standards and best practices should be used for data sets, tools and metadata, if available. Data formats have to be standardized and ideally open. Software has to use open interfaces and be ideally Open Source. http://www.meta-net.eu 22

Guidelines 2/2 Availability / Licensing All necessary Intellectual Property Rights have to be cleared. If commercial licenses are used, they have to be standardized. Copyright limitations /exceptions need to be harmonised across the EU. Restrictions to use or re-use LRs should be the minimum possible. Public Domain content no additional restrictions. If LRs are produced entirely with public funding open or shared at least for research purposes. Any LR infrastructure should support all possible business models without imposing restrictions to market entry. Ethical / privacy issues Adopt effective privacy policies for anonymisation and consent management. http://www.meta-net.eu 23

META-SHARE Legal Domain META-SHARE model licenses support for Opening Creative Commons based licences (starting with Creative Commons Zero (CC-0) licences and all possible combinations) Sharing META-SHARE network licences (templates to be used by members when they wish to make available their resources to other network members) Building on CC concepts Re-deposition, as the main driver for e.g. collaborative annotation recommended standardized commercial licences http://www.meta-net.eu 24

CC and MSC, duties/rights of use Attribution Non Commercial No Derivative ShareAlike

Module Combinations Attribution Attribution Share Alike Attribution No Derivatives Attribution Non Commercial Attribution Non Commercial - ShareAlike Attribution Non Commercial No Derivatives

META-SHARE version 1 www.meta-share.eu, www.meta-share.org www.meta-net.eu/meta-share http://www.meta-net.eu 29

Regular Content Slide Item 1 Item 2 Item 3 Single Sign On enabling hopping from repository to repository http://www.meta-net.eu 30

Regular Content Slide Item 1 Item 2 Item 3 http://www.meta-net.eu 31

Regular Content Slide Item 1 Item 2 Item 3 http://www.meta-net.eu 32

Regular Content Slide Item 1 Item 2 Metadata-based descriptions of LRs, Currently completing META-SHARE metadata schema Item 3 Mapping services from well-established schemas to META-SHARE http://www.meta-net.eu 33

Regular Content Slide Item 1 Item 2 Item 3 http://www.meta-net.eu 34

Regular Content Slide Item 1 Item 2 Item 3 http://www.meta-net.eu 35

Regular Content Slide Item 1 Item 2 Item 3 http://www.meta-net.eu 36

licence Regular Content Slide Item 1 Item 2 Item 3 Standardised Rights of use Agree and download Five clicks to get to download what you need http://www.meta-net.eu 37

Population round 0 Distribution per Language 173 23 23 26 26 26 427 27 30 32 33 114 33 79 93 113 36 39 English French German Spanish Italian Chinese Swedish Dutch Portuguese Finnish Hungarian Estonian Japanese Icelandic Danish Arabic Greek Distribution per Language 12% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 3% 3% 5% 6% 7% 8% 30% 8% English Fr ench German Spanish Italian Chinese Swedish Dutch Portuguese Finnish Hungarian Estonian Japanese Icelandic Danish Arabic Greek 72 Russian Other Ru ssian Other As of 23/6/2011 : 1425 LR packages catering for 40 Languages Distribution per language, # of LR packages >=20 http://www.meta-net.eu 38

From now on 1/3 Implementation Level META-SHARE Version 1: July 2011 Stable, working version of META-SHARE to be rolled out within the META-NET network. Focus on laying the distributed repository infrastructure repository by making available free sw for repository setup to help organise, document LRs META-SHARE Version 2: February 2012 Stable version, ready for production use. Focus on providing services to LR providers, consumers and aggregators (statistics, reporting, recommendations) http://www.meta-net.eu 39

From now on 2/3 Implementation Level Further future versions will provide services to alleviate any manual effort (through scrapers, mapping tools, harvesting, interoperability / conversion programs) Just show us where the relevant information exists and we will do the rest!! http://www.meta-net.eu 40

From now on 3/3 Principles, Operation and Governance Level Gathering community feedback on sharing principles Testing the applicability and suitability of recommended licencing instruments in an as wide as possible range of resource types, countries and jurisdictions Populating repositories with well documented LRs http://www.meta-net.eu 41

Today and tomorrow Come and visit the META-SHARE booth in Room Margit, Desk 8 http://www.meta-net.eu 42

Thank you! 43