Linked Open Data and Semantic Technologies for Research in Agriculture and Forestry

Similar documents
D6.1.2 Piloting Plan

1. PUBLISHABLE SUMMARY

> Semantic Web Use Cases and Case Studies

Striving for efficiency

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Chinese Agricultural Thesaurus and its application on data sharing & interoperability

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

PUBLICATION OF INSPIRE-BASED AGRICULTURAL LINKED DATA

THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I)

HortiCube: A Platform for Transparent, Trusted Data Sharing in the Food Supply Chain

How to contribute information to AGRIS

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons.

Using Linked Data and taxonomies to create a quick-start smart thesaurus

The Linked Data Value Chain Model: A Methodology for Information Integration and Orchestration

SEXTANT 1. Purpose of the Application

From Open Data to Data- Intensive Science through CERIF

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

The European Commission s science and knowledge service. Joint Research Centre

Reducing Consumer Uncertainty

Next GEOSS der neue europäische GEOSS Hub

A distributed network of digital heritage information

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

INSPIRE & Environment Data in the EU

Extending SOA Infrastructure for Semantic Interoperability

Linking datasets with user commentary, annotations and publications: the CHARMe project

Petroleum User Group Meeting, April 2006 Houston, TX. Leveraging Semantic Technology for Improved Enterprise Search and Knowledge Discovery

Some Big Data Challenges

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Towards Linked Data and ontology development for the semantic enrichment of volunteered geo-information

Towards FAIRness: some reflections from an Earth Science perspective

Data publication and discovery with Globus

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Payola: Collaborative Linked Data Analysis and Visualization Framework

Publishing WWII aerial photographs in geographical and library information systems

Pathways. CIARD Consultation Moscow, December 2009

Agricultural bibliographic data sharing & interoperability in China

Integrating the UK Location Information Infrastructure and data.gov.uk

Berlin 3 December Aarhus Convention: Key trends in the implementation of information pillar

Semantiska webben DFS/Gbg

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

The Emerging Data Lake IT Strategy

Solving the Enterprise Data Dilemma

Resource Discovery in IoT: Current Trends, Gap Analysis and Future Standardization Aspects

Overview of Web Mining Techniques and its Application towards Web

4) DAVE CLARKE. OASIS: Constructing knowledgebases around high resolution images using ontologies and Linked Data

Digital Library Interoperability. Europeana

GeoDCAT-AP Representing geographic metadata by using the "DCAT application profile for data portals in Europe"

Interoperability in Science Data: Stories from the Trenches

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Visual Concept Detection and Linked Open Data at the TIB AV- Portal. Felix Saurbier, Matthias Springstein Hamburg, November 6 SWIB 2017

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Full file at

The C3S Climate Data Store and its upcoming use by CAMS

Preserving Linked Data on the Semantic Web by the application of Link Integrity techniques from Hypermedia

Publishing the Norwegian Petroleum Directorate s FactPages as Semantic Web Data

Six Weeks to Security Operations The AMP Story. Mike Byrne Cyber Security AMP

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval

The Information Platform of the Future. MarkLogic and Smartlogic

Building the CIARD Framework for Data and Information Sharing: the case of France & INRA

Linking library data: contributions and role of subject data. Nuno Freire The European Library

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

LinDA: A Service Infrastructure for Linked Data Analysis and Provision of Data Statistics

Joining the BRICKS Network - A Piece of Cake

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Uniform Resource Management

recommendations: flexible scientific publication retrieval on the Semantic Web

Semantic Annotation and Linking of Medical Educational Resources

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

Indiana University Research Technology and the Research Data Alliance

Geospatial Enterprise Search. June

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

E-Agricultural Services and Business

Maximising (Re)Usability of Language Resources using Linguistic Linked Data

Fractal Data Modeling

INSPIRE 2013, Florence

Encounter LATP-ETPs 19 October 2015, Lisbon, Portugal Ernestina Menasalvas, Universidad Politécnica de Madrid

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

Call for expression of interest in leadership roles for the Supergen Energy Networks Hub

Data Discovery - Introduction

Exploring and Using the Semantic Web

The City Science Plan. Energy strategy, audit and compliance managed through modern software

CONFERENCE OF EUROPEAN STATISTICIANS ACTIVITIES ON CLIMATE CHANGE-RELATED STATISTICS

Webinar Annotate data in the EUDAT CDI

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

Cataloguing GI Functions provided by Non Web Services Software Resources Within IGN

spatial metadata automation

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Semantic Interoperability of Basic Data in the Italian Public Sector Giorgia Lodi

Data Management Glossary

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Data Curation Handbook Steps

Optimized Data Integration for the MSO Market

Vocabulary Alignment for archaeological Knowledge Organization Systems

Setting up a CIDOC CRM Adoption and Use Strategy CIDOC CRM: Success Stories, Challenges and New Perspective

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Introduction to SDIs (Spatial Data Infrastructure)

Semantics for Optimization of the Livestock Farming

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Transcription:

Linked Open and Semantic Technologies for Research in Agriculture and Forestry Platform Linked Nederland 2 April 2015 Rob Lokers, Alterra, Wageningen UR

Contents related challenges in agricultural (and forestry) research Exploiting Linked and semantic technologies Trees4Future: a Research Infrastructure for forestry research SemaGrow: a LOD infrastructure supporting the agricultural community some examples and lessons learned 2

Challenges in Agricultural & Forestry Research Research data is only partially available for the whole (research) community is: stored locally/privately, in silos not accessible not documented and metadata is not generated No incentive nor sense of urgency to actively / automatically share data other than through networks and personal contacts. Thus, valuable research data is hard to find if you don t know the right people However, increased pressure to document data, publish research results as open data in a comprehensible and usable manner! 3

Challenges in Agricultural & Forestry Research Agricultural & forestry researchers require data from different domains and has usually very detailed specifications example domain meteorology: Many ways exist to measure, predict, aggregate, post-process temperature or precipitation data. You need quite some technical expertise on climate to be able to select the most appropriate data for your job. Required data is stored a different locations, documented by different institutions working in different domains with different objectives and at different quality levels Metadata often does not provide or easily reveal the relevant details, does not provide the required depth and structure and does not reveal characteristics and of the data itself 4

Challenges in Agricultural & Forestry Research However, most of the complexity we are struggling with is caused above all by structural insufficiencies due to the networked nature of our society. The specialist nature of many enterprises and experts is not yet mirrored well enough in the way we manage informa>on and communicate. Instead of being findable and linked to other data, much informa>on is s>ll hidden. With its clear focus on high quality metadata management, Linked is key to overcoming this problem. The value of data increases each >me it is being re used and linked to another resource. Re usage can only be triggered by providing informa>on about the available informa>on. In order to undertake this task in a sustainable manner, informa>on must be recognised as an important resource that should be managed just like any other. Linked Open : The Essen>als A Quick Start Guide for Decision Makers, Florian Bauer & Mar>n Kaltenböck 5

Linked Open in agricultural research Research projects: Trees4Future, SemaGrow Trees4Future: EU (Forestry) Research Infrastructure project SemaGrow: EU ICT Research project Closing the gap between data supply and data demand in agricultural & forestry research improving data availability, harmonization, discoverability supporting researchers and research processes (data processing & data analytics) using Linked Open and semantic technologies 6

Trees4Future Research Infrastructure Setting up a European knowledge network Explaining the benefits of data sharing Organizing activities to collect, structure and harmonize forestry data Setting up a Clearinghouse as an operational forestry metadata repository making European datasets discoverable and accessible for the whole community. using open standards to register and harvest metadata into a centralized metadata repository using open standards to access data (services, datasets) using LOD and semantic technologies to improve discoverability of datasets 7

8

Find rainfall data Also look for data on precipitation, rain... Trees4Future Clearinghouse User Interface datasets repository 1 provides access repository 2 provides access sets Seman>c Search Layer sets External ontologies Using ontologies: link precipitation to rainfall seman>c search Metadata repository NLP, ontology linking Catalogue Harvester harvests describes Metadata catalogue harvests describes Metadata catalogue Collects metadata of distributed datasets Describes the datasets e.g. keyword = precipitation

Example semantic tagging & search Climate dataset - metadata contains term rain Trees4Future AGROVOC Clearinghouse seman>c tagging Rainfall skos:altlabel Rain skos:altlabel External ontology knows concept rain T4F dataset skos:related Clearinghouse harvester analyses metadata (NLP) and links dataset with T4F concept T4F Concept 287 skos:preflabel Rain skos:exactmatch AgroVoc concept 6435 AgroVoc concept 6439 skos:narrower skos:narrower AgroVoc concept 6156 skos:altlabel skos:narrower AgroVoc concept 2360 skos:altlabel Precipita>on climate skos:altlabel Snow 10

Some lessons learned Metadata is not always perfect and unambiguous limited or no metadata available for (research) datasets different spelling, use of abbreviations etc. use of metadata fields by editors is not consistent link to vocabularies often absent metadata supplied by non-experts, post-project Automatic semantic tagging (using NLP) is not an easy job available ontologies are often very specialized and / or not complete lot of potential ambiguity most application in this area are on bibliographic information, where in general much more context is available to work with. ing and reasoning over semantic network is a challenge performance issues require undesirable optimizations no generic recipe to determine relevance Awareness is growing 11

SemaGrow - Objectives Problem statement: LOD network is growing, data gets interconnected but it is still not easy to transparently access this distributed cloud of heterogeneous data sources. Extend LOD capabilities by setting up an infrastructure that: Allows transparent, federated access to heterogeneous distributed (big) data sources through one federated (SPARQL) endpoint is efficient, real-time responsive, and scalable Is flexible and robust enough to allow data s to publish in the manner and form that best suites their purposes, and data consumers to in the manner and form that best suits theirs. 12

SemaGrow - Objectives Test and evaluate the infrastructure through implementation and evaluation of (agricultural) use cases develop agricultural use cases design & implement demonstrators test & evaluate performance SemaGrow application show cases FAO Information Management (Agris, AGROVOC) Alterra, Wageningen UR Agricultural & Forestry Modelling AgroKnow Agricultural Education 13

SemaGrow- Use case agricultural research An Example: Kenneth is an agricultural modeller in Kenya wants to assess consequences of climate change on agricultural yields needs input for his models: temperature, precipitation soil (available) crop trial data knows about AgMIP, a global community on agricultural modelling knows that Joe is active in AgMIP 14

SemaGrow- Use case agricultural research Can data analytics and data processing be improved by describing not only the semantics of the datasets but also of the contained data? Can we build an infrastructure that supports semantic ing of big linked datasets? Can these improvements be integrated in existing applications to better support research data requirements? Example : Identify available crop experimental data for the Mediterranean area where the crop is sunflower and the soil type is sandy soil. sources: various crop trial databases, European soil map Evaluates the system s ability to perform semantic searches over the dataset metadata, by matching Mediterranean with crop trial spatial characteristics and sandy soil with crop trial soil characteristics. 15

SemaGrow data sizes

SemaGrow architecture 17

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources Decomposition results measurements strategy Transformer Federated Manager results results contents and schema metadata

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources measurements Decomposition strategy What s the best strategy for the requested? results Transformer Federated Manager results results contents and schema metadata

SemaGrow architecture Where should I look? How many results will I get and how fast? Consumer Client Parameters POWDER Resource Discovery candidate sources Decomposition results measurements strategy Transformer Federated Manager results results contents and schema metadata

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources Decomposition results measurements strategy Ontology alignment Transformer Federated Manager Are there data in an aligned schema? results results contents and schema metadata

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources measurements Decomposition strategy What s the best strategy for the requested? results Transformer Federated Manager results results contents and schema metadata

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources Decomposition results measurements strategy Transformer Federated Manager Is my strategy working? Can I improve overall performance results results contents and schema metadata

SemaGrow architecture Consumer Client Parameters Resource Discovery candidate sources Decomposition results measurements strategy Transformer Federated Manager results results contents and schema metadata

Some lessons learned (up till now...) Many technical pitfalls exist... Building on rather immature technology (e.g. RDF databases) and semantic networks Federated access could work, but: many potential points of failure ontology alignment is complicated, even in one domain Big Linked is an enormous challenge, maybe not realistic? ing RDF data structures does not perform well yet even with all kinds of optimizations (which are sometimes against the LOD principles) will we ever really triplify Gbyte datasets? On the other hand: This is ICT research... Even if small steps can be made, there can be high benefits! 25

Thanks for your attention! 26