Global Agricultural Concept Scheme The collaborative integration of three thesauri

Similar documents
Chinese Agricultural Thesaurus and its application on data sharing & interoperability

Agricultural bibliographic data sharing & interoperability in China

VocBench v2.0 User Manual

VocBench v2.1 User Manual

The Agricultural Ontology Server: A Tool for Knowledge Organisation and Integration

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

> Semantic Web Use Cases and Case Studies

MedDRA Update. MedDRA Industry User Group Meeting. 28 September 2018

2. bizhub Remote Access Function Support List

RDA DEVELOPMENTS OF NOTE P RESENTED TO CC:DA B Y KATHY GLENNAN, ALA REPRESENTATIVE TO THE RSC J ANUARY 21, 2017

VocBench v2.3 User Manual

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus

IBM Full Width Quiet Touch Keyboards now available on IBM System p servers and selected IntelliStation workstations

A Dublin Core Application Profile in the Agricultural Domain

Intel USB 3.0 extensible Host Controller Driver

The UNESCO Thesaurus

RELEASE NOTES UFED ANALYTICS DESKTOP SAVE TIME AND RESOURCES WITH ADVANCED IMAGE ANALYTICS HIGHLIGHTS

Introduction and Welcome. Basel, Switzerland 16 April 2018

FAO TERM PORTAL User s Guide

How to Take Your Company Global on a Shoestring A low risk path to increasing your international revenues

The AGROVOC Concept Scheme - A Walkthrough

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus

DINO-LITE SOFTWARE FOTO SCHERM MET SOFTWARE DINO-LITE SOFTWARE

Reducing Consumer Uncertainty

Interactive Whiteboard Module ViewSync vtouch

This bulletin was created to inform you of the release of the new version 4.30 of the Epson EMP Monitor software utility.

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

Using Linked Data and taxonomies to create a quick-start smart thesaurus

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Microsoft Academic Select Enrollment

Local Metadatamanagement in a global environment

RPM International Inc. Hotline Instructions


Guide & User Instructions

Google Search Appliance

QuestionPoint has two types of reports: Activity Statistics and Counts of Current Data.

Microsoft Store badge guidelines. October 2017

American Philatelic Society Translation Committee. Annual Report Prepared by Bobby Liao

SAP Crystal Reports for Eclipse Product Availability Matrix (PAM)

DDC in Germany. Recent Developments and Current Activities. Michael Panzer. January 20, 2007

Organizing Economic Information

Talk2You User Manual Smartphone / Tablet

QuickSpecs. HPE Insight Online. Overview. Retired

A comparison of open source or free GIS software packages. Acknowledgements:

Terminologies, Knowledge Organization Systems, Ontologies

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Building the CIARD Framework for Data and Information Sharing : Preliminary results of an International Expert Consultation

The Madrid System Goods and Services Database. Ernesto Rubio Senior Director Advisor, WIPO Geneva, July 7, 2010

Fiery Command WorkStation 5.8 with Fiery Extended Applications 4.4

SKOS. COMP62342 Sean Bechhofer

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

KYOCERA Quick Scan v1.0

Ontologies SKOS. COMP62342 Sean Bechhofer

PAGE 1 SYSTRAN. PRESENTER: GILLES MONTIER

Report from the W3C Semantic Web Best Practices Working Group

Licensed Program Specifications

krones Academy - media suite User guide

Establishing basic infrastructural and methodological background of Semantic Web in the domain of food and agriculture in Hungary"

Release Notes MimioStudio Software

Linked data implementations who, what, why?

One-screen Follow-up Reference Guide. Merlin.net Patient Care Network (PCN)

IBM xseries options assigned new part numbers

DocuSign Service User Guide. Information Guide

Introduction

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

THE GETTY VOCABULARIES TECHNICAL UPDATE

GLOBAL NETFLIX PREFERRED VENDOR (NPV) RATE CARD:

Profiling Web Archive Coverage for Top-Level Domain & Content Language

Striving for efficiency

GV-Center V2 INTRODUCTION GV CENTER V2 VS. GV CENTER V2 PRO

1.1 Create a New Survey: Getting Started. To create a new survey, you can use one of two methods: a) Click Author on the navigation bar.

IBM System z part numbers available for the IBM WebSphere Business Modeler V products

Localizing Intellicus. Version: 7.3

MaintSmart. Enterprise. User. Guide. for the MaintSmart Translator. version 4.0. How does the translator work?...2 What languages are supported?..

Semantic MediaWiki. Tutorial on Semantic Wikis June 1, 2008 Tenerife, Spain Denny Vrandecic

Terminology Management Platform (TMP)

SM40: Measuring Maturity and Preparedness

WP6: Socio-Constructivism & Language

INSITE Features Notes

LiveEngage System Requirements and Language Support Document Version: 5.0 February Relevant for LiveEngage Enterprise In-App Messenger SDK v2.

AutoFocus, an Open Source Facet-Driven Enterprise Search Solution

WASHINGTON COUNTY, OREGON COOLING CENTER COORDINATION PROCEDURES October 19, 2017

How to contribute information to AGRIS

QT8716. NVR Technology IP Channels 16 (32) Recording Resolution 720p / 1080p / 3MP / 4MP Live Viewing Resolution 1080p. Live Display FPS.

Innovation in Thesaurus Management

Smart Organization of Agricultural Knowledge The example of the AGROVOC Concept Server and Agropedia

PUBLICATION OF INSPIRE-BASED AGRICULTURAL LINKED DATA

ADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Data Foundations & Terminology (DFT) WG/IG & VSIG

Configuring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0

Updated metadata governance tools report

Models HP Engage One Top Mount 2x20 CFD (Black) HP Engage One Top Mount 2x20 CFD (White)

PROFICIENCY TESTING IN FOREIGN LANGUAGES

Installation Guide Command WorkStation 5.6 with Fiery Extended Applications 4.2

EBSCOhost User Guide Searching. Basic, Advanced & Visual Searching, Result List, Article Details, Additional Features. support.ebsco.

1.1 February B

Library Technology Conference, March 20, 2014 St. Paul, MN

'Mixed Methods' Indexing: Building-Up a Multi-Level Infrastructure for Subject Indexing

Transcription:

Global Agricultural Concept Scheme The collaborative integration of three thesauri Prof Dr Thomas Baker [1] Dr Osma Suominen [2] Dini Jahrestagung Linked Data Vision und Wirklichkeit Frankfurt, 28. Oktober 2015 [1] Sungkyunkwan University (Korea) and Dublin Core Metadata Initiative [2] National Library of Finland

Three big thesauri in agriculture Three thesauri of terms and concepts related to agriculture -- concepts like rice, ricefield aquaculture, and plant pests. FAO Food and Agriculture Organization of the United Nations CABI Centre for Biosciences and Agriculture International (UK) NAL National Agricultural Library (US)

Separate thesauri Separate databases

Create GACS as glue linking them together

Global Agricultural Concept Scheme (GACS) 1. Improve semantic interoperability of the thesauri 2. Provide core concepts. 3. Achieve efficiencies through cooperative maintenance.

Requirements 1. Integrated view 2. Reuse of work, such as translations 3. Compatibility with existing databases 4. Based on RDF technologies: URIs, SKOS... 5. Available as Linked Open Data Based on, mapped to, but independent of, its three source thesauri.

AGROVOC CAB Thesaurus NAL Thesaurus 32,000 concepts, >1.2M labels 140,000 concepts, >1.4M labels 53,000 concepts, >200k labels English, Spanish, Portuguese, German, Czech, Persian, Polish, Hindi, French, Italian, Russian, Japanese, Hungarian, Chinese, Slovak, Thai, Lao, Turkish, Korean, Arabic, Telugu... English, Spanish, Portuguese, Dutch + many languages with lower coverage English, Spanish First step: represent all three in SKOS

First rough estimate Obtained via automatic mappings created using AgreementMakerLight

Long tail distribution (in AGRIS) 10,000 concepts cover nearly 99% of occurrences in metadata

Top 10,000 concepts from each Each partner organization provided the 10,000 concepts most frequently used in their respective databases. Added: all countries all higher-level organisms

Automated mappings Created using AgreementMakerLight software between the full thesauri, for completeness

Human evaluation of mappings Created Google Docs spreadsheets using the lists of selected concepts and the auto-generated mappings. Three sheets with circa 10,700 rows each. Mappings manually evaluated by staff of partner organizations. Evaluated 60 to 150 rows/hour. Evaluation took 500 to 600 hours for GACS Beta.

Starting point October 2014

30,000 mappings later... January 2015

4,689 mappings later... February 2015

5,522 mappings later... March 2015

Forming GACS concepts by merging the source concepts and aggregating their information agrovoc:c_1474 cabt:26247 exactmatch cereals UF feed cereals UF small grain cereals (grain) Oryza UF Padia UF rice (plant) exactmatch agrovoc:c_5435 cabt:82917 nalt:56271 agrovoc:c_6599 cabt:101613 nalt:56293 exactmatch rice UF paddy UF paddy rice Oryza sativa UF Oryza glutinosa UF Oryza indica UF Oryza japonica UF Oryza sativa (subsp, var etc.) exactmatch agrovoc:c_5438 cabt:82935 nalt:56277 (Note: GACS uses SKOS, not traditional thesaurus tags)

Lumps clusters of concepts mapped one-to-several, several-to-one, or in spirals

15,090 concepts; 972 lumps March 2015 Lumps

15,278 concepts; 339 lumps April 2015 Lumps

15,411 concepts; 84 lumps October 2015 Lumps

15,406 concepts; no lumps Last week

Polyhierarchy? sciences countries development socioeconomic development economic development social sciences economics developing countries Argentina Buenos Aires

Concept types?

Concept types!

Towards GACS roll-out (2016) Concept scheme as Linked Data. Own publication and editorial platform. Quality improvements. Inconsistencies in hierarchy, choice of labels, scope notes and definitions. Own semantic structure. Common vs scientific names, custom relationships, concept types.

VocBench for editing

Skosmos for display and browsing

Size of GACS GACS GACS Beta 1.1 15,406 concepts 398,216 labels in 28 languages

Extension module for what remains?

AGROVOC and NALT may be phased out Extension module? GACS CABT

Agrisemantics http://aims.fao.org/sites/default/files/report_workshop_agrisemantics.pdf

Global food security and climate change GACS as hub for agricultural code lists, taxonomies, statistical indicators... Simplify data normalization and integration More coherent datasets and research results Help farmers become more efficient

Reports available on the FAO AIMS site http://aims.fao.org/community/agrovoc/blogs/phase-one-gacs-approved-read-reports http://aims.fao.org/sites/default/files/report_workshop_agrisemantics.pdf osma.suominen@helsinki.fi tom@tombaker.org

Abstract The Food and Agricultural Organization of the United Nations (FAO), CAB International (CABI), and the USDA National Agricultural Library (NAL), maintainers of three large thesauri of agricultural terminology that largely overlap in scope, have partnered to create a shared Global Agricultural Concept Scheme (GACS). Duplication of effort has proven to be both inefficient and a barrier to users wishing to search across databases indexed with their terms. Expressing AGROVOC, CAB Thesaurus, and NAL Thesaurus in RDF and SKOS, as Linked Data, facilitates mappings, but mappings among three large, continually moving targets are difficult to maintain. Starting with algorithmically generated mappings among three sets of the terms most frequently used to index the AGRICOLA, CAB Abstracts, and AGRIS databases, thesaurus managers in the GACS Working Group have manually vetted the mappings for quality and are currently correcting logical inconsistencies. In a final iteration, these mappings will be used to generate a Global Agricultural Concept Scheme with its own identifiers, and GACS will be moved into its own distributed editorial environment and jointly maintained by the three partners. Targeted for beta release in early 2016, GACS aggregates the complementary strengths of its sources, such as expertise in particular areas and labels in twenty languages. Formulating consistent policies for GACS on issues such as scientific versus common names for organisms requires balancing scientific, commercial, educational, and mass-market perspectives. The challenge of global food security under conditions of climate change will require the integration of data at all levels. GACS can serve as a focal point in the broader ecosystem of vocabularies, code lists, database schemas, ontologies, statistical indicators, and taxonomies required to drive agricultural research and innovation.