D6.1.2 Piloting Plan

Size: px
Start display at page:

Download "D6.1.2 Piloting Plan"

Transcription

1 ICT Seventh Framework Programme (ICT FP7) Grant Agreement No: Data Intensive Techniques to Boost the Real Time Performance of Global Agricultural Data Infrastructures D6.1.2 Piloting Plan Project Reference No. ICT FP Deliverable Form Deliverable No. D6.1.2 Relevant Workpackage: WP6: Real-life Deployment and User Evaluation Nature: O=Other Dissemination Level: PU = Public Document version: 1.0 Date: 16/06/2014 Authors: FAO, DLO, AK, UAH Document description: The document describes the demonstrators that will be implemented and evaluated in the project. Page 1 of 61

2 Document History Version Date Author (Partner) Remarks v0.1 18/10/2013 FAO Initial version v0.2 10/03/2014 FAO Intro, Chapter 1 v0.3 01/04/2014 FAO Chapter 2 v0.4 05/04/2014 FAO Chapter 3 v0.5 04/05/2014 AK Chapter 4 v0.6 08/05/2014 DLO Inclusions of latest contributions from DLO v0.7 09/05/2014 AK v0.8 05/06/2014 FAO, UAH, DLO, AK Re-structure the deliverable and update the evaluation parts Revision by FAO and inclusion of updates and contributions by FAO, UAH, DLO, AK resulting from 4th Plenary Meeting discussions v0.9 10/06/2014 NCSR-D Internal review v1.0 23/06/2014 FAO, UAH, DLO Final updates in response to review and delivery as D Page 2 of 61

3 EXECUTIVE SUMMARY This deliverable is an updated version of deliverable D It describes the design of the end-user applications that will be used in the SemaGrow pilot trials, the piloting plan, and the evaluation methodology that will use pil final plan for the implementation of the applications selected as demonstrators of SemaGrow technologies, including more details about the evaluation methods and tools for the SemaGrow demonstrators. Section 1 provides the relevant introduction to this deliverable. Section 2 introduces the evaluation context for the SemaGrow demonstrators, Section 3 introduces the layered evaluation framework that we adopt, Section 4 describes the SemaGrow evaluation methods, tools, metrics, and evaluation experiments. Section 5 is about the implementation of the pilot trials. Finally, Section 6 presents the conclusions of the deliverable. Page 3 of 61

4 TABLE OF CONTENTS LIST OF FIGURES... 6 LIST OF TABLES INTRODUCTION Purpose and Scope Approach Relation to Other Work Packages and Deliverables Big Data Aspects SEMAGROW DEMONSTRATORS Trees4Future/AgMIP Rationale for selecting Trees4Future/AgMIP as a demonstrator Technical description Demonstrator development Evaluation objectives for stakeholders AGRIS Rationale for selecting AGRIS as a demonstrator Current architecture of AGRIS AGRIS demonstrator Foreseen evaluation Agricultural Discovery Space (ADS) Rationale for selecting ADS as a demonstrator Current architecture and status of the Agricultural Data Discovery Space Foreseen architecture & implementation Foreseen evaluation Overview of the evaluation stakeholders SEMAGROW EVALUATION APPROACH Layered evaluation approach Towards a layered decomposition for SemaGrow Use Cases EVALUATION METHODS AND TOOLS FOR SEMAGROW DEMONSTRATORS Review of evaluation methods and tools Evaluation methods and tools selected for SemaGrow demonstrators Overview of the SemaGrow evaluation Controlled pilot trials Hackathons Potential implementation of controlled pilot trials Introduction to Open Data in Agriculture Hands-On workshop Page 4 of 61

5 4.3.3 Hackathon event DEMONSTRATORS DEVELOPMENT AND EVALUATION PLAN Evaluation timeframes and deliverables Trees4Future-Agmip demonstrator Development plan Evaluation plan AGRIS Development plan Evaluation plan Agricultural Discovery Space Development plan Evaluation plan CONCLUSIONS REFERENCES ANNEX: Annex A: Example of user satisfaction questionnaire Page 5 of 61

6 LIST OF FIGURES Figure 1-1: Dependencies between D6.1.2 and other deliverables Figure 2-1: Trees4Future architecture Figure 2-2: Trees4Future-AgMIP demonstrator user interface Figure 2-3: Trees4Future application infrastructure Figure 2-4: SemaGrow Trees4Future-AgMIP application infrastructure Figure 2-5: AGRIS current architecture Figure 2-6: AGRIS data flow Figure 2-7: Agricultural Data Platform Figure 2-8: Metadata Aggregation Workflow Figure 2-9: Agricultural Data Platform supported by SemaGrow software stack Figure 3-1: The SemaGrow Architecture and the evaluation layers Figure 4-1: Overview of tasks for pilot evaluations, partners responsible, and evaluation activities 39 Figure 4-2: Piloting trials stages Figure 4-3: Piloting trials stage Figure 4-4: Piloting trials Stage Figure 4-5: Piloting trials stage Figure 5-1 Steps of the overall validation approach Figure 5-2: Example of cohort analysis for a specific period Page 6 of 61

7 LIST OF TABLES Table 2-1: List of properties used in AGRIS Table 3-1: Mapping the evaluation layers with the SemaGrow Architecture [26] Table 3-2: Mapping the evaluation layers with Trees4Future-AgMIP Table 3-3: Mapping the evaluation layers with AGRIS Table 3-4: Mapping the evaluation layers with ADS Table 4-1: Evaluation methods (source: USINACTS, 1999) Table 4-2: Mapping Trees4Future-AgMIP with evaluation methods and metrics Table 4-3: Mapping AGRIS with evaluation methods and metrics Table 4-4: Mapping ADS with evaluation methods and metrics Table 5-1: Summary of planning, implementation and evaluation of SemaGrow demonstrators Table 5-2: List of SemaGrow deliverables related to demonstrators evaluation Table 5-3: Implementation planning for the Trees4Future-AgMIP demonstrator Table 5-4: SemaGrow Trees4Future-AgMIP 1st Pilot Trial Evaluation Table 5-5: SemaGrow Trees4Future-AgMIP 2nd Pilot Trial Evaluation Table 5-6: Implementation planning for the AGRIS demonstrator Table 8-1: Generic User satisfaction questionnaire Page 7 of 61

8 1. INTRODUCTION 1.1 Purpose and Scope The present deliverable presents a revised plan for the implementation of the service demonstrators for SemaGrow, and it also describes the methodology for carrying out pilot trials on using the SemaGrow Stack to support applications that address real-life data problems; and for using such trials as the basis of evaluating user satisfaction regarding the reactivity of the SemaGrow Stack. Actual implementation of the demonstrators is carried out within task T6.2 (which started at M16 of the project), while their evaluation is carried out within T6.3 (starting at M19 of the project).following the terminology adopted in WP2, we use here Use Case Categories to refer to the main areas of focus of WP6: Heterogeneous Data Collections & Streams (lead by DLO); Reactive Data Analysis (lead by FAO); and Reactive Resource Discovery (lead by AK). Each partner has identified one or more relevant application to serve as a basis for the project demonstrators. The demonstrators discussed in this deliverable are: Trees4Future/AgMIP (identified by DLO); AGRIS (FAO); ADS (AK). By use case we mean a sequence of actions meant to address a given information goal. Use cases as referred to in this deliverable, are meant to serve as a basis for the evaluation of demonstrator s features. By pilot trials we refer to the evaluation of the developed demonstrators. We call them pilots to emphasise the fact that they will be performed with real users, as opposed to the technical evaluation that is performed using the automated methods and tools developed in WP4 Rigorous Experimental Testing. Besides this introductory section, this document includes presentations of the context and current status of the applications that are assumed as a basis for the pilots, the envisaged updates and a plan for their implementation, and their relevance to the purpose of the SemaGrow pilots (Section 2). The document then proceeds to introduce the layered evaluation framework that we adopt (Section 3), to present the evaluation methodology (Section 5), and conclude (Section 6). 1.2 Approach The implementation of the demonstrators started at M16 of the project. UAH will provide technical support for the backend of the applications that will be used in the pilots, while changes required for applying SemaGrow technology in existing applications and / or interfaces will be realized by the partner responsible for each pilot (FAO, DLO and AK). The first implementation of the demonstrators will be delivered at M21, followed by a first evaluation phase to be concluded by M24. Then, a second iteration of implementation and evaluation will follow. Page 8 of 61

9 WP2: Use Cases & Architecture WP6: Real-life Deployment and User Evaluation Task 2.1: Envisaged Applications and Use Cases (D2.1.2) Task 2.2: Data Streams and Collections (D2.2.2) Task 6.1: Piloting Plan (D6.1.1) Task 6.1: Piloting Plan (D6.1.2) Task 6.2: Pilot Deployment (D6.2.1) Task 6.3: Pilot Trials (D6.3.1) Figure 1-1: Dependencies between D6.1.2 and other deliverables. 1.3 Relation to Other Work Packages and Deliverables This document builds on deliverable D6.1.1, where the first plan was presented. Among other updates, this version also incorporates the project s reaction to the recommendations of the first year project review. These are the main changes with respect to the previous version of the plan: 1. The number of demonstrators is reduced to three, one for each of the three use case categories. 2. Better, more detailed description of each demonstrator, so as to describe which big data challenges they solve and to include technologies and architectures of the current versions of the applications selected to serve as demonstrators, and a description of the future demonstrators. 3. Tree4Future and AgMIP have been merged, while the AGRIS focus changed to include the automatic discovery of Web resources related to the AGRIS domain. At the same time AgLR Toolkit was removed, and the repercussions of SemaGrow deployment for data management back-ends are demonstrated in ADS instead. The demonstrators described in this deliverable are based on the work done in WP2 for what concerns the features requested by users and stakeholders (D2.1.2: Envisaged Applications and Use Cases), and the data to use (D2.2.2: Data Streams and Collections). This document refines the applications and use cases described in D2.1 from the perspective of evaluating user satisfaction and system reactivity in real-life deployments. This design and planning is used to drive the deployment (D6.2.1) and execution (D6.3.1) of the first round of SemaGrow pilot trials (Figure 1-1). Page 9 of 61

10 Initial provisions for the second piloting round are also described, but will be refined and finalized as an outcome of the first piloting round. 1.4 Big Data Aspects The piloting plan aims at designing pilots that will provide the user interactions necessary in order to evaluate user experience from the perspective of the different stakeholders. The big data aspects of this design are discussed in Section Evaluation objectives for stakeholders for the Trees4Future/AgMIP demonstrator and in Section The AGRIS Demonstrator for queries that join AGRIS data against Web crawl data. Page 10 of 61

11 2. SemaGrow Demonstrators In this section we present the SemaGrow demonstrators, explaining the rationale behind their selection and presenting which big data challenge they address. For each demonstrator, we describe its underlying architecture, the technologies adopted, its main functionalities, the lifecycle of the data used in them, and the users involved. We also provide a short indication of what functionalities we select for evaluation, including a discussion of the reason for selecting them, how the evaluation of those functionalities contributes to the project and to the partners developing those demonstrators. We also highlight the user groups involved in the evaluation. 2.1 Trees4Future/AgMIP Trees4Future develops a research infrastructure for European forestry research. As part of the project a geo-spatial Clearinghouse is developed that allows forestry research stakeholders (researchers, modellers, policy makers, sector organisations etc.) to effectively discover relevant data for their work. The main mechanism for increasing the effectiveness of searches is by exploiting semantic technologies on top of (harvested) metadata. AgMIP is a global network of researchers that work on agricultural modelling and intercomparison of different crop and agro-economic models. From that perspective they run into similar issues as Trees4Future stakeholders regarding the discovery, selection and (pre)processing of data that is required for their modelling exercises and the size, amount and complexity of involved data Rationale for selecting Trees4Future/AgMIP as a demonstrator The Trees4Future and the AgMIP communities share many issues related to the discovery of large amounts of heterogeneous data sources. First of all the discovery of targeted data sets is generally an issue. This is usually performed based on the dataset s metadata, but the use of semantics is generally limited. Besides, a lot of valuable information which resides in the data content itself is not accessible. Trees4Future therefore extends the classical approach of spatial dataset discovery by including knowledge on dataset attributes (as an extension of the metadata, so still not using the data itself) where possible. Triplification of this extended metadata and enriching the triples with semantics allows for more efficient, semantically driven discovery of geo-datasets. This increases the effectiveness of searches by filtering out irrelevant results on the one hand and discovery of otherwise undiscovered datasets on the other hand. Besides, it allows for more effective valuation of search results, e.g. by more effective relevance scoring of search results. AgMIP faces the same type of issues in discovery of agronomic data sources for agricultural modelling exercises. It is expected that the AgMIP community can benefit highly from the Trees4Future approach by just adopting the concepts and principles used by Trees4Future. That might start as straightforward as adopting the Trees4Future application as a search mechanism on top of triplified and semantically enriched AgMIP data sets. Moreover, being able to analyse data sets based on their (data) contents should also allow querying for sub selections of data. Page 11 of 61

12 Figure 2-1: Trees4Future architecture Technical description The Trees4Future Clearinghouse is a knowledge portal offering search & discover functions on top of forestry research related datasets. The backend of the system is a semantic data store containing triplified metadata which is harvested from registered catalogues and is semantically enriched through linkage to relevant ontologies. Figure 2-1 shows the architecture of the current Trees4Future Clearinghouse. Metadata is harvested from a range of metadata catalogues supporting different metadata standards (OAI-PMH, OGC-CSW, Thredds (NetCDF) etc.). Harvested metadata is subsequently triplified and stored as RDF triples in a Triple Store, following the structure of a custom developed ontology. After triplification, the concepts identified in the metadata are automatically linked to a set of relevant (external) ontologies. The Trees4Future user interface provides a search interface on top of the triple store. The query interface between GUI and backend components is implemented through a SOAP interface, communicating with a server component that transforms the SOAP request to SPARQL queries. Page 12 of 61

13 Figure 2-2: Trees4Future-AgMIP demonstrator user interface The technical specifications of the Trees4Future components are as follows: Graphical user-interface: PHP, HTML/JavaScript/JQuery Query protocols: SOAP (GUI to server) + SPARQL (server to triple store) Server components: J2EE, Java Triple store: Sesame OpenRDF Catalogue harvester: GI-CAT, customized OAI-PMH harvester Figure 2-2 shows a sketch of the user interface of the Trees4Future-AgMIP demonstrator, derived from the Trees4Future GUI. It clearly shows the different dimensions that can be used to query the current Trees4Future metadata. Search terms The free search terms textbox allows users to perform queries based on one or more search terms (implicitly using the AND operator). The search terms are matched with concepts (skos:concepts) linked to the datasets in the Trees4Future triple store using semantic search mechanisms. The relevance of the dataset is derived from the distance between the dataset and the concept in the ontology. Page 13 of 61

14 When typing, the users the system will come up with suggestions based on the first 3 characters and the matches with contents of the triple store. Spatial search The spatial search allows entering spatially explicit search terms, using either geographical names or the option to select a spatial extent on a map. Geographical names are converted to a geographical extent, so queries are essentially similar to the extent based queries. For the metadata/full dataset search, the application returns all datasets that have an overlap with the selected feature or extent. When retrieving data subsets, queries return only the subset of data that is inside the selected feature or extent. The resolution refers to the spatial scale required by the end user. The application will return all data that has either the requested resolution or resolutions that can be transformed to the requested resolution (so with a higher resolution, request for 25x25 km will also return 10x10 km) Temporal search Through the temporal search option, a user can define a start and/or end time point of interest for the queried datasets. The start/end dates can be exact dates or only years. For the metadata/full dataset search, the application returns al datasets that overlap with the selected period or, if only one time point specified, the datasets that contain data that (partially) concerns the period after the specified start date or before the specified end date respectively. For actual data selection, queries return only the subset of data that is inside the selected period, after the selected start date or before the selected end date respectively. The resolution refers to the temporal scale required by the end user. The application will return all data that has either the requested resolution or resolutions that can be transformed to the requested resolution (so with a higher resolution, request for weekly data will also return datasets with daily data) Query results The application returns as search results references to datasets. If available, the results include URLs to view (e.g. WMS) and data (e.g. WFS, HTTP-based file access) services. For the SemaGrow demonstrator a two-step process is foreseen, where the data services retrieve data (sub)sets trough the SemaGrow Sparql endpoint. In the first step the query is fired to retrieve the dataset references (metadata + links as shown in Figure 2-2). In the second step, the user could click on the data access link to retrieve the data content. This would fire a second query, retrieving the (subset of) data according to the query criteria through the SemaGrow endpoint Demonstrator development The Trees4Future AgMIP demonstrator will combine T4F s Clearing house forestry research data, such as forestry genetics and forest management data and AgMIP s data (e.g. climate data, both on past climate as well as future estimations, soil data, crop trials) required to run agronomic and agroeconomical models. All data will be triplified and stored in triple stores accessible by the Semagrow stack, along with triple stores containing triplified metadata of the aforementioned data. Page 14 of 61

15 Figure 2-3: Trees4Future application infrastructure More specifically, the architecture of the Trees4Future AgMIP demonstrator comprises: The Trees4Future AgMIP user interface (GUI / demonstrator), where users can fill in the parameters of their search and details like resolution. The Semagrow stack, accessed by the GUI through a single SPARQL endpoint, where the query components are processed against metadata information, semantic alignments of pre-stored ontologies and data stored in external triple stores. The data repositories, accessed by the Semagrow stack through SPARQL endpoints, one for each dataset. The Trees4Future AgMIP semantic user interface (GUI) will provide fields to state the search parameters (thematic, statistical, spatial, temporal and combinations of these). The user can, besides search terms, also specify the required resolution of the data. When submitting the user defined query parameters, an initial query is executed on a metadata repository, containing metadata harvested from dataset headers (e.g. the headers of NetCDF files) and compiled from external catalogues describing individual datasets or groups or datasets. The procedures and guidelines prepared in Task 5.1 for adding new datasets and the associated metadata to a SemaGrow federation also apply for NetCDF files, provided the SemaGrow triplifiers have been applied and the resulting triple store is accessible to the SemaGrow Stack through a SPARQL endpoint. The results of the initial search on the metadata are displayed to the user, who can then refine the initial search criteria as well as adjust the desirable resolution of the parameters, in order to perform Page 15 of 61

16 adapted queries and get more specific or relevant results. The results of the queries are displayed as datasets relevant to the search, along with descriptive metadata and (automatically produced) estimates of the number of results contained in them. The user also has the ability to state the required resolution for the results. In the presence of applicable re-scaling mechanisms, this will act as a request to apply such mechanisms in the final post-processing step below; otherwise it will act as a filter. Finally, the user selects the appropriate resolution and datasets among the displayed results and issues the last query. It should be noted that rescaling and merging will not always be possible, so care will be taken to author the query templates that will be filled by the GUI is such a way that they will only retrieve datasets with matching specifications and dimensions. In other words, some prior domain knowledge about what can be reasonably merged will be encoded in the query templates, alleviating the user from the burden of wading through long lists of meaningless suggestions. The final query is performed on the actual data contained in the datasets the user selected. This time, the datasets containing the actual data, and not their metadata as done in the previous queries, are queried by the SemaGrow Stack through their SPARQL endpoints. First, the SemaGrow Stack creates a local, temporary dump containing all the data matching the search criteria. A final post-processing step (the NetCDF Creator, Figure 2-4) merges and (if possible) re-scales data collected from different datasets. The output of the procedure is a NetCDF file containing the requested combination of variables at the requested resolution and dimension constraints and can be directly used as input for research models. As an example, a user interested in temperature and precipitation data in Spain might first query for data for a region in Spain for a specific time period. When there is no feasible data available, he might extend the query to search for datasets covering the whole of Spain or even Europe. Subsequently he can choose the relevant datasets and additionally select the desired spatial and temporal resolution (e.g. monthly temperature and rainfall at a resolution of 25 x 25 km or higher). For this final query the user can request the actual data, which will be a file containing only the data of the parameters the user stated, in association with the rest of the criteria (space, time etc.) In order to develop the Trees4Future AgMIP demonstrator based on the Trees4Future system a range of development activities are required to migrate from the current Trees4Future application to the Trees4Future-AgMIP demonstrator. To support the understanding of the required changes, Figure 2-3 and Figure 2-4 show the current Trees4Future application infrastructure and the foreseen implementation of the Trees4Future-AgMIP demonstrator for SemaGrow respectively. Page 16 of 61

17 SPARQL Endpoint Metadata Repository Trees4Future/AgMIP Demonstrator GUI Data services SPARQL Proxy SOAP NetCDF SPARQL Endpoint ClearingHouse Server NetCDF Creator Triplified Trees4Future Repository SPARQL SPARQL Endpoint Triplified AgMIP Repository... SPARQL Endpoint Triplified dataset SPARQL SemaGrow Stack Figure 2-4: SemaGrow Trees4Future-AgMIP application infrastructure A stepwise approach will be defined to migrate from the current Trees4Future implementation to the full SemaGrow Trees4Future-AgMIP demonstrator of which the steps are defined as follows: 1) Transform the Trees4Future interface to a Trees4Future-AgMIP demonstrator working on top of the SemaGrow infrastructure, serving Trees4Future data and metadata: - Adaptation of the Trees4Future server component to query the SPARQL endpoint of the SemaGrow infrastructure; - Provide access to Trees4Future metadata through SPARQL endpoint on the Trees4Future triple store; - Transfer the parts of the Trees4Future ontology required by the SemaGrow infrastructure to effectively query the Trees4Future Sparql endpoint. 2) Extend the SemaGrow Trees4Future-AgMIP demonstrator with query functions on Trees4Future data content (using the same end user interface): - Design of additional / extended SPARQL queries that support (1) the querying of datasets based on criteria related to the data and (2) support selection of a subset of data based on criteria related to the data; Page 17 of 61

18 - Extending the Trees4Future ontology in the SemaGrow infrastructure with the semantics required to effectively query the Trees4Future data. - Development of a data conversion component translating the result set (RDF data) into a usable (spatial) format, e.g. WCS, WFS, NetCDF or ESRI shape. 3) Add AgMIP data and search and discovery support for AgMIP: - Setup of a triple store for AgMIP (meta)data and triplification of the AgMIP datasets - Extending SemaGrow ontologies with semantics required to effectively query the AgMIP (meta)data. The development timeline for this demonstrator, its activity planning and the deliverable schedule, as well as the plan for evaluation are further elaborated in paragraph Evaluation objectives for stakeholders The stakeholders for the Trees4Future-AgMIP demonstrator are on one side the potential user community of the demonstrator (user perspective) and on the other hand the parties interested in exploiting the SemaGrow infrastructure in a broader domain for similar big data problems (project perspective). The user community for the demonstrator consists of the following groups: - Forestry researchers and students requiring big data (sub)sets for their modelling and analysis work (Trees4Future community). - Forestry practitioners and consultants searching for data to support their advice and analysis work (Trees4Future community). - Climate change adaptation researchers in the area of forestry and agriculture requiring big data (sub)sets for their modelling and analysis work (Trees4Future and AgMIP community). - Policy makers in the area of agriculture and forestry searching for data related to their policy domain (Trees4Future and AgMIP community). These users commonly have problems discovering the data required for their work in the wealth of available data in the domains of forestry, agriculture and climate change. They do not know all available datasets and can in some cases not judge the relevance of datasets. Therefore they are often depending on experts, and the process of deriving the required data can be a time consuming and error-prone exercise. The expectations for this demonstrator from the perspective of the user community are: - A more effective way of searching datasets. o Finding the best data for the job without missing relevant datasets. o Finding data with the best possible (spatial and temporal) resolution o Being able to assess the relevance of discovered datasets - Discovery of data and datasets should be possible with acceptable performance - The search mechanism should also be able to provide data sets or subsets in a usable format (which is in general not the RDF/XML format). From the perspective of the project it is essential to evaluate the capacities of the SemaGrow infrastructure to be able handle the heterogeneous big data sets. The user community and the use Page 18 of 61

19 case elaborated can stand model for the big data problems that exist in many knowledge intensive domains that deal with heterogeneous and multi-dimensional big data sets. The expectations from the project are: - To test the capabilities of the SemaGrow infrastructure to deliver the Trees4Future and AgMIP functionalities: thematic, spatial and temporal queries over (meta)datasets, returning references to datasets, the datasets itself and sub-selections of datasets. - To explore and test the ways to increase the effectiveness of data discovery through semantic technologies - To test and compare performance of comparable search queries over the classical Trees4Future system and the SemaGrow implementation. The expectations described here will be translated to a set of evaluation criteria that can be evaluated in the demonstrator pilot trials to be performed. The evaluation procedures will be elaborated in Section AGRIS AGRIS is one of the biggest and most important information systems in the agricultural domain. It is a database composed of more than 7.7 million bibliographic references in agriculture. The AGRIS Web portal ( receives an average of visits/months and it is World Wide accessed (from more than 190 countries and territories, according to Google Analytics statistics). AGRIS is indexed by Google and its content comes on top of Google results. Moreover, AGRIS is already largely oriented to semantic technologies, as it uses RDF data and accesses various SPARQL endpoints. For all these reasons AGRIS is a natural choice to test SemaGrow technologies Rationale for selecting AGRIS as a demonstrator AGRIS may be considered a mash up application based on semantic technologies, in that it - when returning documents relevant to a user s query - it also enhances them with links to a variety of related resources. This enhancement is based on resulting documents content and not on the user s query. By exploiting the technologies developed within SemaGrow, we expect to further enhance the ability of AGRIS to provide users with relevant resources in a reactive and robust manner; while also allowing for diverse, heterogeneous, and large-scale data sources to be easily incorporated by the AGRIS administrators. As a first step, the SemaGrow demonstrator will combine bibliographic results from the AGRIS database with relevant information already available in the Web. The latter will be retrieved from the FAO Web crawl database, holding metadata about Web resources relevant the agricultural domain and that may nicely integrate the information already available in AGRIS. This database is populated by a dedicated system (developed and maintained by FAO outside SemaGrow) for crawling the public Web and annotating Web pages with AGROVOC concepts. From the point of view of AGRIS users, SemaGrow technology may be the key to improve the informativeness of the service, as more data sources are combined with the bibliographic references they retrieve, including: Page 19 of 61

20 Figure 2-5: AGRIS current architecture Relevant pages crawled from the public Web and annotated with AGROVOC concepts, selected for the semantic similarity to the AGROVOC annotations of the bibliographic entries Relevant meteorological, soil, and experimental results datasets from the Trees4Future/ AgMIP collections, selected for the semantic similarity to the AGROVOC annotations of the bibliographic entries under the SemaGrow-produced vocabulary alignment between AGROVOC and the DLO vocabulary. From the point of view of the AGRIS administrators, SemaGrow technology may contribute to consolidating AGRIS in its role of major information service in the area of agriculture. From the project point of view, AGRIS is also a good test bed because it is already largely based on semantic technologies, and it is therefore interesting to understand the impact of adopting SemaGrow technologies for systems that are already in the area i.e., without drastically change the underlying architecture, or the skills of the system administrators Current architecture of AGRIS The logical view of the AGRIS high level architecture may be described as consisting of four main components, plus the CIARD RING (Figure 2-5). At the centre we can see the AGRIS Web application. This is a Java application deployed in a Tomcat Web Server. This application refers to the Web interface and to the algorithms to allow the users to look for agricultural information in AGRIS. Page 20 of 61

21 Figure 2-6: AGRIS data flow The Apache Solr server allows the AGRIS Web application to quickly retrieve results and display them to the users, helping in retrieving also statistical information and to perform some analysis on top of AGRIS data. A filesystem XML database is used as a bibliographic repository to store metadata coming from data providers. In the existing AGRIS architecture, the Apache Solr index is built on top of this database. Then, there is a triplestore (currently Allegrograph) used to store the so called AGRIS RDF, namely the RDF-ization of the filesystem XML database, enhanced with additional AGROVOC URIs computed by automatic procedures as the AgroTagger. Finally, we mention the CIARD RING. This is an external component to AGRIS, in that it does not strictly belong to AGRIS, but it is necessary to retrieve information about AGRIS data providers. Technologies. The AGRIS application is entirely based on Java. Some APIs used by the application: Apache Struts 2.0 for the Web interface and the exchange of parameters between the user and the application itself; Apache Solr for the indexing of resources; Sesame 2 to query the triplestore. Page 21 of 61

22 Data workflow. AGRIS receives data from a variety of applications and institutions, each managing their data independently. Therefore, the very first phase of the data workflow in AGRIS is to map the format received into the AGRIS internal model. In this way, AGRIS data providers may continue to use their own data model and still contribute to AGRIS. Currently, the AGRIS internal data model is the AGRIS AP 1 for the filesystem XML database, and the AGRIS RDF for the triplestore. In the future, only the AGRIS RDF should be the AGRIS format, possibly extended with more properties. Table 2-1 lists the AGRIS RDF properties currently in use. After the conversion to the AGRIS internal model, data is indexed by the Solr index and made accessible for search through the AGRIS web site. Data coverage. The AGRIS core database consists of bibliographic metadata in the agricultural domain. However, other types of data (always related to agriculture and food) are interlinked to AGRIS, such as maps, statistics, etc. AGRIS now accesses the following external datasets: Europeana, species distribution data from GBIF, Nature, DBPedia, germplasm data from Biodiversity International, FAO Country Profiles, IFPRI, World Bank. Users involved in AGRIS. The following groups (in terms of profile) of users are involved in AGRIS: software developers, agricultural researchers, students, librarians, information management specialists, agricultural journal editors, related data providers, and other interested people. Table 2-1: List of properties used in AGRIS. bibo:article dct:creator -> foaf:organization -> foaf:name bibo:abstract dct:creator -> foaf:person -> foaf:name bibo:doi dct:datesubmitted bibo:isbn dct:description bibo:presentedat -> bibo:conference -> dct:title dct:extent bibo:uri dct:identifier dct:alternative dct:rights dct:type dct:publisher -> foaf:organization -> foaf:name dct:issued dct:subject dct:source dct:language dct:ispartof dct:title 1 Page 22 of 61

23 2.2.3 AGRIS demonstrator The AGRIS demonstrator for SemaGrow requires changes mostly at the level of data flow. The current architecture will not undergo substantial changes, while a new source of data should be added - coming from a massive harvesting of the Web and subsequent processing to find meaningful combination between the AGRIS core database, the resulting LOD database, and other interesting databases provided by SemaGrow. The core idea is to harvest the Web, starting from pre-selected sources of information in the agricultural domain: then, discovered resources will be enhanced with AGROVOC and stored in a big triplestore (the crawler database). This triplestore can be used to define combinations with the AGRIS core database (in order to create a widget for the AGRIS Web portal) and with other databases. In detail, the following technical components will have to be added to the current AGRIS in order to make a demonstrator for SemaGrow: 1. A customized Apache Nutch Web crawler to harvest data from the Web o The Web crawler and the tagging component may be provided by FAO, by tuning existing tools. A web crawler is needed to gather data from the Web, while an automatic tagging tool is needed in order to apply a first phase of filtering for relevant content, relying on the AGROVOC thesaurus. 2. A triplestore suitable to store the big data collected in the previous phase, i.e. the AGRIS core database (~200 million triples) and the crawler database (expected to quickly reach the order of magnitude of gigatriples) o The needed triple store will be provided by SemaGrow partners. Currently, the discussion to find a place where to store this database is still open. o Once the triplestore is provided, FAO can provide triplestores to fill it, coming from the AGRIS core database and the crawler database. 3. A processing phase will then take place, in order to work on the dataset collected and discover meaningful combinations between the AGRIS core database and crawler database. We also plan on connecting AGRIS data with the data made available by DLO (see Trees4Future-AgMIP demonstrator), therefore this processing phase will also include that data. o A domain expert in agriculture is needed to define possible combinations of databases in natural language. Currently, this domain expert has not been identified yet, but it could be provided by SemaGrow partners. o Processing tools will be provided by SemaGrow partners, with the coordination and direct contribution of UAH. The type of processing envisioned would include the selection of resources that meet requirements such as: an AGRIS record and a crawler record having at least 4 (or 5) common AGROVOC URIs o Support on mapping data, as needed, will also be provided by SemaGrow partners, with specific role of UNITO. 4. A triplestore of relevant selected resources will have to be set up o Technical support will be provided by SemaGrow partners. Page 23 of 61

24 5. A new widget, based on the information included in the triplestore of point 4, will be added to the AGRIS Web portal o FAO will provide the technical support to write and deploy the widget in AGRIS. In order to interlink DLO data to the AGRIS database, a mapping between AGRIS and DLO will be needed. Two options are available: - DLO data could be indexed with AGROVOC URIs. In this way, the AGRIS engine is able to automatically display DLO data when an AGRIS record comes with specific AGROVOC URIs - DLO data could be accessible by a REST Webservice and queried using scientific names. These two options are valid for any dataset which has to be interlinked to AGRIS. Skills required building an AGRIS demonstrator. Given the current status of AGRIS, no major changes in skill would be needed on the side of the system administrators. No changes at all are required on the side of the application end users Foreseen evaluation Two aspects of the demonstrator will have to be evaluated. On the one hand, the efficiency of the data enhancement phase. On the other hand, a more user-oriented evaluation will also have to be performed, in order to assess the improved experience of the end user. - For what concerns the former type of evaluation, support will be provided by technical partners in the project, who have to ensure response time in the order of the second. - For the latter, a survey (via AIMS) could be used to evaluate this process, when a widget will be available in the AGRIS mashup page. This will affect the following evaluation stakeholders: researchers, domain specialists, librarians. Note that from a technical point of view, this step requires that the discovered combinations of point 4 will be put somewhere and made queryable via sparql endpoint or RESTful Webservice. 2.3 Agricultural Discovery Space (ADS) For the use case Reactive Resource Discovery, based on the recommendations of the reviewers, from the two demonstrators that were originally selected, AgLR toolkit and Agricultural Discovery Space (ADS), we will focus on ADS with the support of the semantic search tools of SemaGrow infrastructure. ADS will be realised covering the research needs of educators and trainers in the areas of Food Safety and Agricultural Research Information to explore specific ways to cover their requirements in order to find material for their activities. Any agricultural data discovery space either it is part of a web portal or it is part of a tool like the AgLR toolkit, is based on the Agricultural Data Platform that AK has developed. In order to improve the discovery experience of the user we need to improve the Search API that the Agricultural Data Platform provides to the developers of the discovery applications. Therefore, the use case of Reactive Resource Discovery will mainly focus on how the data platform can be improved through the results Page 24 of 61

25 of the SemaGrow project. The improvements will be reflected both at the layer of the APIs that the data platform is exposing and at the layer of the front end discovery applications. The ADS case will focus on how the agricultural data platform could be enhanced in order to allow multiple and diverse data sources with specialised educational and research content to be searched, accessed and interlinked with the aggregated (by the platform) content. In specific, the existing agricultural data platform aggregates metadata describing mainly educational and bibliographic resources. However, the existing platform is neither scalable nor efficient for handling too many different types of resources described by heterogeneous data Rationale for selecting ADS as a demonstrator The expectation is that ADS may demonstrate the efficiency of SemaGrow technologies, in particular for what concerns the reactive discovery of resources described in different contexts. In the ADS case, the perspective from which multiple heterogeneous and diverse data sources are considered is the one of Food Safety and Agricultural Research Information, during which the users need to cope with reactive resource discovery in order to be able to find, reuse and exploit data resources. In order to provide meaningful and efficient agricultural data discovery services to the end users, AK plans to improve its data platform at the following directions Support and link heterogeneous data sources. Currently only specific data types can be supported, namely bibliographic and educational, and for any new data type it is required to set up a new customized instance of the data platform. The customization consists in creating new data model for the database, new transformers for the new data type, and revising existing processing components. In addition to that, the final index that is created can be connected only through aligned or common classifications to the other existing data types. This means that federated query is not possibly for all the data types supported by the data platform. The process of supporting a new data type is costly and time consuming. Support reactive response discovery. Currently querying two or more different data types is implemented as a parallel call to two or more APIs. This highly reduce the user experience as concerns the data discoverability. More specifically, user cannot perform complex queries and needs to perform more clicks to discover the content that he is seeking for. High efficiency. Currently AK needs to install a new data platform instance in new cloud infra (at least 4 VMs) every time that a new data type should be supported. Moreover, at the front end applications with high visibility there are cases in which resources are consumed to call APIs that are not available or that cannot provide the requested information due to low content coverage. Page 25 of 61

26 Figure 2-7: Agricultural Data Platform The Agricultural Data Platform can be connected with a global Open Agricultural Data Registry provided by directories like the CIARD RING ( of FAO where all the data sources are described and published in machine-readable format. Such global directories can work as information backbones for the ecosystem of stakeholders such as data scientists, developers, and SMEs that would like to use the available open agricultural data to develop new meaningful services for the end users. One of the main business objectives of AK is to build a data shop for open agricultural data that will be based on the agricultural data platform. Since heterogeneity in such data shop is a typical challenge, a basic enabler for such agricultural data shop will be the SemaGrow software stack. Figure 2-8: Metadata Aggregation Workflow Page 26 of 61

27 2.3.2 Current architecture and status of the Agricultural Data Discovery Space The Agricultural Data Platform is an open system that can aggregate data from various data sources, store, enrich, transform and index them in order to prepare data to be consumed by developers and applications. The backend of the system is an aggregator with a number of steps for supporting the acquisition and maintenance of the metadata records from different content providers The various steps of the aggregation workflow are presented in the last figure. More specifically the workflow for metadata acquisition includes: The ingestion step: the first step consists of ingesting all the metadata records from a remote site of a content provider. Metadata standards such as OAI-PMH are used in most cases. The filtering step: filtering is a step consisting of discarding incoming records considered as inappropriate either because the object it describes is inappropriate (e.g., in a collection of educational resources, discarding metadata describing resources covering topics not related to Organic Agriculture and Agroecology ) or because the record is syntactically incorrect. The latter can be seen as a light form of validation that focuses on detecting errors that can potentially compromise the correct functioning of the aggregation service. The identification and deduplication step: during this step, a software component is used to compare new metadata records to the existing ones to see if the objects they describe are already referenced in the catalogue. Transform into internal format: this step is used to transform the XML versions of the metadata records to JSON files that follow the principles of an abstract data model. This step requires transformers capable to convert the various formats and application profiles of the metadata records collected at step 1 into the internal format. Link checking: this step is responsible for checking if the URL for accessing the learning object is broken or not. For all learning objects for which the location included in the metadata record has been recognised as broken, the index is updated accordingly in an automatic way. Post processing: there are cases in which there is a need to normalize the metadata records in order to avoid problems in the front-end applications. Such example is the normalization of language attributes values for title in English which may be provided either using en or eng. In this case the post processing step will normalize all the values so they can use the correct ISO code for the language. Enrichment: this step can be used to enrich the metadata elements of some collections. Page 27 of 61

28 Figure 2-9: Agricultural Data Platform supported by SemaGrow software stack Store and publish records: the final step of the metadata aggregation workflow is the storage in a repository of all the new metadata records that have successfully passed the deduplication and URL checking step. They are stored on the file system where they are organized by sets. This consolidated metadata store is exposed to a web server so that records can be easily access online. A typical URL is of the form e.g. /LOM/GREENOER/12345.xml Also, this step consists of the metadata publishing through standard protocols and APIs, one supported by the repository and the other by a stand-alone web application. In order to provide a friendly way to access the aggregated and processed metadata, a RESTful API allows several search options over the indexed metadata records (JSON files) following the internal format. In specific it allows the users (or applications) to make the following type of queries: 1) Simple text-based search, 2) Searching within specific fields (metadata), 3) Fetching specific resources given an identifier, 4) A combination of text-based search with faceted search, Page 28 of 61

29 5) Filtering resources according to dates mentioned in specific fields but not with date ranges. Regarding the described data platform, one major issue is that high quality mappings and transformations are needed to be defined and implemented by experts in order to integrate new different types of resources. Such a procedure costs in terms of human resources and time required until a new data source is available through the index Foreseen architecture & implementation The following figure depicts the foreseen general architecture. In the new architecture, the new discovery services will be set up on top of the existing data platform enhanced by providing access to more metadata through SemaGrow powered search API. The green parts in the diagram correspond to SemaGrow revisions in the Data Platform for Agricultural Data Discovery. More specifically, by taking advantage of the project s technology we plan to enhance the data platform towards the following dimensions: Provide to users the ability to access and reuse more resources of several types Provide the capability to cover more information needs of users through APIs with higher expressivity Have a provenance mechanism for filtering the origin of resources in the response of a query Enhance the current data platform to be more robust and automatic: Minimize the effort required of vocabularies and metadata alignment Interlinking and finding interesting relationships among ingested and non-ingested resources of several types Foreseen evaluation The changes in ADS demonstrator will affect the following evaluation stakeholders: Developers: either individuals or working for a SME that wants to develop data products for the agricultural and food sector using the data APIs that the SemaGrow powered data platform will provide. They should evaluate the data platform and the Search end point (SPARQL) powered by SemaGrow to identify the main problems that they are facing when they are using a Search API in order to build a discovery application. Data scientists that want to use the data APIs of the AK s data platform to develop and test new data processing algorithms and components using data that are exposed by the Data Platform Trainers that seek for training courses, educational resources and that want to create a training pathway related to food safety. They will be affected by the new SemaGrow powered discovery applications that will focus in finding information related to agricultural research and food safety, which performance will be altered in terms of time, accuracy of the search results, user experience but also in flexibility of the different queries to external sources. Page 29 of 61

30 Domain specialists (e.g. agronomists, food safety experts) from Organizations and Institutes that seek scientific information related to agricultural research and food safety topics. They will be affected in the same way with the trainers. Organizations and Institutes that want to set up a discovery service that can use heterogeneous data sources. They will be affected in the same way with the trainers. 2.4 Overview of the evaluation stakeholders In this section, we discuss the evaluation of the SemaGrow demonstrators from the point of view of the people involved in the evaluation, the evaluation stakeholders. A stakeholders-focused approach is the main part of the SemaGrow evaluation methodology, involving the relevant community of users (educators, researchers and information officers). The involvement of stakeholders can also add to the overall recognition and participation in SemaGrow activities, like the technical testing and the feedback on user satisfaction. Furthermore, such community involvement during the pilot deployment and evaluation phase may also contribute to improve the user understanding of the impact that using large volumes of data may bring to the sector. It is useful to distinguish between direct and indirect stakeholders: Direct stakeholders in the design, pilot deployment and evaluation of SemaGrow Web Applications are the envisaged users and their organizations Indirect stakeholders are other parties who take an interest in the development and provision of the SemaGrow infrastructure (i.e. national and European decision makers in research policies, research associations and others), or may be indirectly affected by their future use (i.e. existing organizations that offer individual services also covered by SemaGrow). Primarily relevant for the SemaGrow evaluation process are the direct stakeholders, i.e. the envisaged user communities whose engagement and feedback is decisive for a successful conduct and results of the evaluation activities. SemaGrow aims at providing advanced data services for agricultural data infrastructures. Among these communities we distinguish different evaluation stakeholders that are described in the next paragraphs. Researchers These are scientists who are involved in developing agricultural models and in running models for large scale research studies and policy assessments (agronomists, biologists, environmentalist, climate change experts etc.). They are interested in web applications that require input from large to very large and heterogeneous datasets, covering various type of content (agricultural data, soil, measures, environmental information, climate change details, geographical information, economic and statistic data). This type of users includes data scientists (mathematicians, statistician, who are interested to use the agricultural models, in order to get complex results from combing data from different types of data sources, i.e., the prediction of the production in specific environmental conditions (climate and soil data) and the correlation of the production with the prices of the agricultural goods. Page 30 of 61

31 Developers Developers either individuals or working for a SME that wants to develop data products for the agricultural and food sector using the data APIs that the SemaGrow powered data platform will provide. Domain specialists Agricultural specialists, consultants and domain experts (PhD students and professionals with agricultural knowledge), who are interested in using agricultural models and tools, by analysing their research data or related data from other external sources. Data specialists are looking for data from various type of sources, like bibliographic / academic data, educational material, genetic information, geographical details, economic / statistic data etc. This type of stakeholders includes pupils and students who are interested to access not only to educational material, but also to the whole educational activities (including the supporting material: handbooks, manuals, research papers, videos, lectures) of an educational / research pathway, i.e. the analytic method for nutrient analysis of grapes. End users of agricultural models maybe agricultural specialists with business oriented needs In order to improve the innovative perspective of their business idea (innovative product or method). Educators/Trainers This category of users includes agricultural-related consultants and educators (i.e. trainers and extension workers). They are located in an array of domains of research and education including agriculture and food safety. They are interested to identify and access, as well as to share their own current research and educational material (scientific papers, research data, training material). These user groups create a new educational / training pathway with their own material or using data from external resources. Data owners / curators and annotators of data This group of users work on the organisation and integration of data / content around various agricultural research and education topics (i.e. land use, soils, water and other natural resources) and broader thematic fields (i.e. organic agriculture, sustainable agriculture, food security). Some examples include curators and annotators ( inter-linkers ) of bibliographic resources, agricultural learning collections, genetic resources and geographical data. Librarians and other institutional information managers This group of users includes librarians and other managers of institutional or subject/domain-based content databases / repositories of many universities and other research centres, publishers, public administrations and NGOs. This user group manages the content provided by researchers and educators and avail of a certain level of IT capacity, both in terms of technical infrastructure and Page 31 of 61

32 skills. Some of them are managing advanced digital libraries or data centres, while many more have only limited capacity and small content collections, but aim to connect to, and collaborate with, content / data sharing initiatives. Page 32 of 61

33 3. SemaGrow evaluation approach 3.1 Layered evaluation approach SemaGrow evaluation approach will be based on a layered evaluation framework. Evaluation of such complex systems can be a challenging and difficult task. Recent studies have suggested the adoption of layered approaches in order to identify the components of a system that may affect its overall performance (Pu et al., 2012). Layered evaluation (or decomposition) frameworks have attracted research attention for more than a decade, with several frameworks, methods and instruments being proposed and tested in relevant literature (Paramythis et al. 2010; Manouselis and Verbert, 2013; Manouselis et al., 2014). They try to decompose a system in its constituent subsystems or layers that can be evaluated one by one and then apply particular evaluation methods that can assess the performance of each targeted layer. Pu et al. (2012) have suggested that layered evaluation can be used as a powerful technique in identifying areas of a system that require further improvements. A series of layered evaluation frameworks have been proposed in the literature of the evaluation systems, advocating that each component of the systems should be evaluated separately, in order to collect valuable feedback for the pros and cons of each part of the system. The idea can be traced back to the early 90s, when Totterdell & Boyle (1990) proposed that (i) the accuracy of the user model and (ii) the effectiveness of the changes (adaptations) made by the adaptive systems should be evaluated separately. Ten years later, Karagiannidis & Sampson (2000) proposed the term layered evaluation, and suggested that the evaluation should address the main components of each system separately. A similar layered framework was proposed by Weibelzahl (2001), with the decomposition of the adaptation into three layers: Evaluation of input data Evaluation of the inference mechanism Evaluation of the adaptation decisions On the other hand, Paramythis et al. (2010) further elaborated their decomposition of the layered evaluation framework by proposing five layers (or modules): Interaction monitoring Interpretation and interface Modelling Adaptation decision making Applying adaptations Additional approaches that adopted to some extent a layered- or component-based approach were also proposed by Herder (2003), Magoulas et al. (2003), and Tobar (2003). Reviewing the state of the art in related work, Paramythis et al. (2010) grouped together the main approaches and suggested the following main layers of adaptation: Collection of input data Interpretation of the collected data Modelling the current state of the world Page 33 of 61

34 Deciding upon adaptation Applying (or instantiating) adaptation They argued that these adaptation layers serve as the core components upon which evaluation can take place, aiming to isolate and evaluate separately, as many as possible given the particularities of a given system. 3.2 Towards a layered decomposition for SemaGrow Use Cases To illustrate how a layered de-composition can serve as a starting point for the development of a more concrete and practical evaluation framework, as proposed by Manouselis et al. (2014a and 2014b), we elaborate on the mapping an adapted version of the layers presented by Karagiannidis & Sampson (2001) to the components of SemaGrow use cases, in order to provide some generic principles and guidelines that the three SemaGrow use cases and the respective demonstrators could explore. More specifically, we focus on each layer and breakdown the interaction components to distinguishable elements. Then, using an existing analysis of each SemaGrow use cases to various dimensions (as it is described in the deliverable D2.2.2 Envisaged Applications & Use Cases), we further analyse the interaction components to more fine-grained sub-components. This analysis can down to the level of granularity, that the evaluation framework designers believe that it will provide meaningful results to the researchers. For each SemaGrow use case, there is the need to clarify the dimensions that have to be evaluated, using a similar approach that it was proposed for the recommendation systems (Manouselis & Costopoulou, 2007; Manouselis et al., 2014a). The high level analysis for all SemaGrow demonstrators is presented in Table 3-1, while a more detailed analysis per demonstrator is presented in Tables 3-2, 3-3 and 3-4. Based on this analysis, different evaluation methods and criteria are set for each evaluation layer of each demonstrator. This analysis is presented in section after an overview of the available appropriate evaluation methods. Figure 3-1 presents the mapping of the SemaGrow Architecture with the evaluation layers. Table 3-1: Mapping the evaluation layers with the SemaGrow Architecture [26] Evaluation layers (Karagiannidis & Sampson, 2000) Interaction Assessment Adaptation Assessment Interaction Components (Pu et al, 2012) Resource Presentation (Client) Resource Discovery (SemaGrow Stack) SemaGrow Dimension (Objectives) Client front-end Interaction Client back-end interaction Indexing algorithms for efficient storage and retrieval Query decomposition and rewriting Schema alignment methods Page 34 of 61

35 INTERACTION LAYER Client ADAPTATION LAYER Semagrow Stack Off-Stack Semagrow Components SemaGrow SPARQL endpoint Resource Indexing Query Decomposition Alignment Query Transformation Query Manager and Execution Engine Data Source #1 Data Source #n Figure 3-1: The SemaGrow Architecture and the evaluation layers Table 3-2: Mapping the evaluation layers with Trees4Future-AgMIP Evaluation layers Client Interaction Assessment SemaGrow Stack Assessment Interaction Components Client Usability/ Performance Search Ability to select data sub-sets Table 3-3: Mapping the evaluation layers with AGRIS Evaluation layers Agricultural data crawled (SemaGrow Stack) Interaction Components Client Usability/ Performance Agricultural data combinations / front-end application Content integration Table 3-4: Mapping the evaluation layers with ADS Evaluation layers Agricultural Data Discovery front-end applications Agricultural data aggregation and processing Agricultural data retrieval and publishing (SemaGrow Stack) Interaction Components Client Usability/ Performance Content integration Search effectiveness Usage of end points (Activation) Page 35 of 61

36 4. Evaluation Methods and Tools for SemaGrow Demonstrators In this section, we describe the methods, tools and evaluation metrics that will be used in the context of the SemaGrow pilot trials. We also explain the proposed experiments for evaluation. 4.1 Review of evaluation methods and tools Evaluation methods can be either quantitative or qualitative methods in nature. Table 4-1 provides a summary of various testing and evaluation methods that allow for comparison (Holzinger, 2005, Matera et al., 2006, Rohrer, 2008, USINACTS, 1999). The selection of the evaluation methods to adopt was made by taking into account our requirements on evaluation, which cover the following aspects: Use of qualitative and quantitative methods in order to ensure an appropriate number of users (quantitative) and depth of involvement (qualitative); Consideration of the difference between opinions versus actual behaviour (i.e. what users say about tested the three service demonstrators vs. what they actually do with them); and Consideration of different contexts of actual use: i.e. evaluation with selected users in the lab environment versus open, online use of tools and services. The assessment of which methods should be selected took account of the use cases described into deliverable D2.1.2 Envisaged Applications & Use Cases. More use cases will be described in the next version of the present deliverable in order to define the pilot specifications for each demonstrator. Thus, different dimensions have been considered in the selection of the set of evaluation methods. In the table below the selected methods are highlighted (in grey), and briefly described. Evaluation methods used in IT projects include the experiments, interviews, surveys, observations, focus groups etc. Table 3-1 provides a summary of the properties of each method reviewed in USINACTS guideline (USINACTS, 1999) to compare them and choose the most appropriate for SemaGrow requirements for every evaluation phase. The SemaGrow controlled pilot trials will be based on structured interviews, usability evaluation, surveys (on-site and on-line questionnaires) and input logging, which support the aims of the evaluation methodology. Page 36 of 61

37 Table 4-1: Evaluation methods (source: USINACTS, 1999) Method Lifecycle Stage Users Main Advantage Main Disadvantage Experiments Components design (hardware or software). Establishing generic principles for system design. Usually few, but depends on complexity It allows testing design hypotheses or alternatives in an optimal way. Complex techniques involved, which requires expert knowledge for maximum benefit. Usually made in the usability laboratory, and not in the real use environment. Interviews User requirements. Task analysis 5 Flexible, in-depth attitude and experience probing. Time consuming. Hard to analyze and compare. Observation Task analysis Usability testing Several (>3) It is made in real use environment. Very costly. Difficult to analyse, and to know the reasons for behaviour. Usability testing Early design, "inner cycle" of iterative design None (it is made by experts) Finds out individual usability problems. Can address expert user issues. Does not involve real users, so does not find "surprises" relating to their needs. Focus groups User group feedback < 10 / group Spontaneous reactions and group dynamics. Allows to find out opinions or factors to be incorporated in other methods (i.e., surveys) Hard to analyse. Low validity. Input logging (Web analytics) Final testing, follow-up studies At least 20 Finds highly used (or unused) features. Can run continuously. Analysis programs needed for huge mass of data. Violation of users privacy must be prevented. Surveys (User Feedback) Follow-up studies. Also for user requirements. Hundreds Tracks changes in user requirements. Analysis of user's opinion for the working system in its real environment. Special organization needed to handle replies. Page 37 of 61

38 4.2 Evaluation methods and tools selected for SemaGrow demonstrators Overview of the SemaGrow evaluation The evaluation process will address the assessment of three service demonstrators on top of semantic store infrastructure, taking place through different phases and involve different stakeholder / users groups. The evaluation will take place in five (5) different phases including a numbers of evaluation activities Deployment I - First functional version of the integrated SemaGrow components Controlled pilot trials - Cycle I Deployment II - Refinement and alignment of the second integrated version Controlled pilot trials - Cycle II The following figure (Figure 4-1) provides a first overview of important elements of the SemaGrow evaluation experiments. Based on the above explanation, we would like to highlight that FAO, ALTERRA and AK have strong experience in their fields and their knowledge and previous experiences in the evaluation of ICT systems, tools and services will contribute greatly to the professional execution of the testing and evaluation activities. The following tables present a mapping of the evaluation methods and metrics to be used for each evaluation layer for each specific demonstrator, based on the initial analysis presented in Section 3. Table 4-2: Mapping Trees4Future-AgMIP with evaluation methods and metrics Evaluation layers Interaction Components Evaluation Metric Evaluation Method Client Interaction Assessment Client Usability/ Performance Usability and User satisfaction Usability testing and User satisfaction Correctness Experts testing, Controlled Physical Trial, Online Controlled Trial SemaGrow Stack Assessment Search Completeness Ranking Accuracy Experts testing Experts testing, Controlled Physical Trial, Online Controlled Trial Ability to select data subsets Correctness Completeness Experts testing Experts testing Page 38 of 61

39 Figure 4-1: Overview of tasks for pilot evaluations, partners responsible, and evaluation activities Controlled pilot trials The controlled pilot trials will take place at partners sites where selected groups of stakeholders will be invited to test and evaluate SemaGrow service demonstrators and provide their feedback. This task will be organized and run with selected users that belong to the different communities that are being considered. The group of users (between 10 and 20) will give feedback on how they can overcome their data problems by using the SemaGrow-enhanced demonstrators. As discussed earlier, two rounds of controlled pilot trials have been planned, controlled pilot cycle I and controlled pilot cycle II: Controlled pilot - Cycle I: This pilot will take place immediately after the first deployment of SemaGrow integrated tools, giving input for the second SemaGrow integration Controlled pilot - Cycle II: After the second integrated version of SemaGrow tools, the second phase of controlled pilots will take place in order to ensure a realistic vision of how the SemaGrow results may be deployed in real life environments. This phase will follow an iterative approach. Table 4-3: Mapping AGRIS with evaluation methods and metrics Evaluation layers Interaction Components Evaluation Metric Evaluation Method Agricultural data crawled (SemaGrow Stack) Client Usability/ Performance Usability Speed Usability testing, Observation, Experiments Agricultural data combinations / front-end application Content integration Correctness Completeness Surveys, Interviews, Website feedback Page 39 of 61

40 Results will be collected by each pilot trial and analysed in an integrated report that will provide recommendations for the further improvement of the SemaGrow components and ideas for the possible deployment of the demonstrators under real life conditions. Due to the diversity of the demonstrators and the respective evaluation stakeholders, there will be different types of pilot trials that will be used based on each specific case Experts testing A number of developments in the demonstrators can only be tested with a small number of experts that will offer their expertise and examine the demonstrator extensively. Such a case is the back-end of the ADS demonstrator that is dealing with agricultural data aggregation and processing. Only experts can evaluate it and their numbers and availability are very small. In this case a physical pilot trial with multiple evaluators cannot take place. This approach will be used in other demonstrators as well, according to the needs of each specific case Physical trials The controlled physical pilot trials will take place at partners sites where selected groups of stakeholders will be invited to test and evaluate SemaGrow service demonstrators and provide their feedback. This task will be organized and run with selected users that belong to the different communities that are being considered. The group of users (between 10 and 20) will give feedback on how they can overcome their data problems by using the SemaGrow-enhanced demonstrators. Table 4-4: Mapping ADS with evaluation methods and metrics Evaluation layers Agricultural Data Discovery front-end applications Agricultural data aggregation and processing Interaction Components Evaluation Metric Evaluation Method Client Usability/ Performance Content integration Search depth Number of searches transformed to content access Relevance Precision/Recall Speed Ranking Accuracy Time/effort needed to support a new data type Data types supported by data platform Domain experts testing, Controlled Physical Trial, Online Controlled Trial Data experts testing Agricultural data retrieval and publishing (SemaGrow Stack) Search effectiveness Usage of end points (Activation) Precision/ Recall Speed Number of active registered developers Developers testing, Controlled Physical Trial, Developers testing, Controlled Physical Trial, Page 40 of 61

41 Interviews In addition to the controlled physical pilot trials engaging groups of users, we will also conduct oneto-one meetings that will include a presentation of the demonstrator, a hands-on session that will allow us to observe the user and an interview to collect more qualitative results. Engaging a single user each time will allow us to gather more quality feedback and address issues related to the availability and the flexibility of the users Online trials experiments) Online pilots will give the opportunity to the designer of the SemaGrow use case to measure the change in user s behaviour when they are interacting with more than one demonstrator or they are setting queries, which require more time than the allocated time of a controlled pilot (in the context of a half-day event). Additionally, an online experiment provides evidence that the candidate approaches are reasonable, which gradual reduces the risk in causing significant user dissatisfaction. The online evaluations will be conducted using the same methods and tools, as they will be developed for the controlled pilots. They are not obligatory but offer an alternative to evaluating a demonstrator that may be more suitable to test depending on the circumstances (i.e. to allow more flexibility for remote users to participate, etc.) Hackathons Hackathons are piloting events that will use the SemaGrow Stack components and datasets in competitions (and benchmarking) for developing real world applications, from external to the consortium groups. Such events provide the opportunity for verifying that SemaGrow results address not only the needs of the participating user partners, but also the needs of the Semantic Web community in general, providing also performance evaluation measurements in real-world applications. The first hackathon already took place during the first year to test the datasets and the 2 nd will take place on 4-7 July The 3 rd will take place during the final phase of controlled pilot trials. 4.3 Potential implementation of controlled pilot trials This section presents a possible implementation of each cycle of the controlled pilot trials for the demonstrators that consists of three stages: (1) Introduction to Open Data in Agriculture, (2) Handson workshop and (3) Open event - Hackathon (Figure 5-1). Stage 1 and Stage 2 lasts one day each. Stage 3 is optional, and can last around 2 days. The implementation of such a structured approach in the pilot trials is not obligatory, but provides a way to combine Hackathons and the pilot trials with events that will help attract more stakeholders, by offering them information that is of great value to them. Page 41 of 61

42 4.3.1 Introduction to Open Data in Agriculture Figure 4-2: Piloting trials stages Stage 1, Introduction to Open Data in Agriculture, provides a full introduction to the selected use cases for the specific controlled pilots. A series of presentations will be given to the participants in order to empower the stakeholders group and expose them to the real business opportunities and challenges in the agricultural value chain. Figure 4-3: Piloting trials stage 1 This stage will present the importance of open data in agriculture and the opportunities that it offers. During this one-day event, organizers will detect the business ideas of the participants that are related to the service demonstrator that will be presented and tested during Stage 2 hands-on workshop. The intro day is a full day event, based on the involvement of instructors with strong domain knowledge and strong business experience and the blended activities with theoretical sessions and hands-on session, defining the data products that are interested to build on. Participants will be educators, trainers, researchers, data owners or information officers in correlation with the type of pilot (Heterogeneous Data Collections & Streams, Reactive Data Analysis and Reactive Resource Discovery). In more details, a typical agenda for the workshop will include: Introduction to a data-powered agricultural business ecosystem (typical data collections, example of service/application providers and representative end users) Overview of open data types, sources and sets for agriculture Case study: research data and examples of data challenges Page 42 of 61

43 Case study: an agricultural data processing platform and hands-on with the selected use cases Hands-On workshop The stage 2 is the actual pilot trial event which aims to present a tutorial on how to use the testing service demonstrator and then to give the opportunity to each invited group of stakeholders (researchers, educators, data owners and information officers) to have an experimental session for giving their input about the user satisfaction on playing with the service demonstrator-powered by semantic technologies. The participants feedback will be collected, by using specific evaluation tools, selected each time by the organizers (i.e. online questionnaire, interview). Figure 4-4: Piloting trials Stage 2 The objective of Stage 2 is to familiarize all the stakeholders (10-20 participants) with the service demonstrator that is going to be evaluated each time. The second stage is one-day event, including theory and hands-on: Theory session delivered by data experts from the implemented partners (FAO, ALTERA, AK) Hands on workshop with technology experts, in order to familiarize participants with the testing use case each time (i.e. on how to use the SEEMLESS integrated database) In more details, a typical agenda for the workshop will include: Introduction to the service demonstrator that is powered by semantic infrastructure Hands-on the selected service demonstrator Feedback on user satisfaction Hackathon event The implementation of this event is not required for the evaluation of all demonstrators, but we include it here so as to support the teams who are interested in that. This stage is about having a Hackathon weekend focused on the agricultural data-powered ideas / start-ups, attracting top talent from the data-powered domain, participants from the previous two steps and connecting them with the data scientists and developers who are interested to implement SemaGrow semantic technologies into the design of their ideas. A typical Hackathon is organised during a weekend and focuses on the provision of working prototypes for one or more relevant technological problem. The pre-defined challenges (related with Page 43 of 61

44 the objective of each selected use case) should be well introduced to the participants, including the actual business background and the user need that it tries to cover. Following to that, participating teams undertake the task to provide working solutions for each challenge within the tight timeframe provided. Figure 4-5: Piloting trials stage 3 As key success factors for the organisation of this stage, organisers should invest on good dissemination in order to manage to attract the most appropriate candidates and ensure a rich pipeline of talent for the rest of the process. The dissemination effort is most effective when it is addressed the participants in a targeted way (universities, start-up associations etc). The challenges, which are related to the selected use cases, should be relevant, inspiring and have a clear impact potential to agricultural research community and to business start-ups with innovative ideas. The typical technology start-up audience is strongly attracted to meaningful challenges. The participants to these types of events can be either existing start-ups or team of programmers / technologists that have not officially formed a company and they are interested to test and explore the SemaGrow use cases. Organizers may also bring in the team more individual participants. Other types of audience are also possible, like data scientists and domain experts with special interest in analysing data from agricultural and related research fields. One or more meet-ups before or during the first day of the hackathon event will contribute to the success of the Hackathon. Usually, meet-ups are a series of contextualised events, visiting typical examples of the agricultural industry that are related to the selected use case that will be tested during each pilot event and hosted at a specific agricultural company. The key objectives of the meetups are the exposure of start-ups to the real expressions of each presented use case, facing real world stakeholders. The key success factors are related to careful selection of the hosting companies and/or the invitation of the appropriate keynote speakers. Page 44 of 61

45 5. Demonstrators development and evaluation plan 5.1 Evaluation timeframes and deliverables The evaluation plan includes the schedule of all the corresponding SemaGrow activities for the proper evaluation of the SemaGrow integrated components. Each evaluation aspect should be considered independently, taking into consideration: a) the expected deployment time of the SemaGrow integrated components, and b) the nature of each evaluation step, including details on the methods that will be used. The evaluation plan is divided into five different phases that are presented in the table below (months refer to project month). Each phase is linked to a number of tasks that should be undertaken with certain methods and tools. Based on the recommendations of the reviewers from the 1 st review meeting, the second cycle of pilot trials will follow an iterative approach with smaller cycles, depending on the demonstrator. The results of testing and controlled pilots will be integrated into the integrated evaluation report, aimed at supporting for further development, improvement or refinement of the integrated services and tools. Table 5-2 presents the evaluation related products that need to be delivered during the project and their respective deadlines (months refer to project month). Table 5-1: Summary of planning, implementation and evaluation of SemaGrow demonstrators Phases Tasks Tools Piloting Plan Pilot Deployment Controlled Pilots - Cycle I Pilot Deployment II Controlled Pilots - Cycle II First version of the evaluation plan, documenting the plan for the development of the three (3) demonstrators and the methodology and materials for the pilot trials. Second version of the evaluation plan, documenting the plan for the development of the three (3) demonstrators and the methodology and materials for the pilot trials. First functional version of the integrated SemaGrow components First controlled pilot will provide input for the second integrated SemaGrow platform Second functional version of the integrated SemaGrow components Second cycle of pilot trails Iterative approach Start Month End Month - M7 M12 - M14 M19 Test cases M19 M21 Interviews, Questionnaires, Input logging M22 M24 Test cases M22 M27 Interviews, Questionnaires, Input logging M28 M30 Page 45 of 61

46 Table 5-2: List of SemaGrow deliverables related to demonstrators evaluation 5.2 Trees4Future-Agmip demonstrator Development plan The following stepwise approach will be performed for implementation of the Trees4Future-AgMIP demonstrator. Phase 1: Transform the Trees4Future interface to a Trees4Future-AgMIP demonstrator working on top of the SemaGrow infrastructure, serving Trees4Future data and metadata: a. Adaptation of the Trees4Future server component to query the SPARQL endpoint of the SemaGrow infrastructure; b. Provide access to Trees4Future metadata through SPARQL endpoint on the Trees4Future triple store; c. Transfer the parts of the Trees4Future ontology required by the SemaGrow infrastructure to effectively query the Trees4Future Sparql endpoint. Phase 2: Extend the SemaGrow Trees4Future-AgMIP demonstrator with query functions on Trees4Future data content (using the same end user interface): a. Design and implementation of additional / extended SPARQL queries that support (1) the querying of datasets based on criteria related to the data and (2) support selection of a subset of data based on criteria related to the data; b. Extending the Trees4Future ontology in the SemaGrow infrastructure with the semantics required to effectively query the Trees4Future data; c. Development of a data conversion component translating the result set (RDF data) into a usable (spatial) format, e.g. WCS, WFS (or NetCDF, ESRI shape if required). Phase 3: Add AgMIP data and search and discovery support for AgMIP: a. Connection to the triple store for AgMIP (meta)data as implemented in WP2; b. Set up an ontology to support effective data selection from AgMIP data sources, including parts of AGROVOC and the standardized variable list offered though the ICASA version 2.0 data standards. c. Extending SemaGrow ontologies with semantics required to effectively query the AgMIP (meta)data. Page 46 of 61

47 The deployment of the first version and the execution of associated pilot trial for the demonstrator will include the full implementation work of phase 1 as well as some of the work performed for phase 2. The phase 2 functions included in the first pilot trial will be limited to those required to shown basic (big) data querying. The second deployment and pilot trial is performed on the full demonstrator including the implementation work from phase 2 and 3. The time planning of the implementation of the Trees4Future demonstrator and its pilot trials is given in Table Evaluation plan The evaluation plan for the Trees4Future-AgMIP demonstrator is fully aligned with the implementation plan described in the previous paragraph. Thus, the first controlled pilot trial will focus on the Trees4Future community and end users, evaluation the similar functionalities over both the Trees4Future application and the SemaGrow demonstrator and some new, data-oriented queries demonstrating SemaGrow big data query capabilities. The second controlled pilot trial will focus also on the AgMIP user community and on evaluating the full set of offered functionalities from their perspective. Phase / Task Table 5-3: Implementation planning for the Trees4Future-AgMIP demonstrator Delivery Month Deploy ment 1a- Adaptation of the Trees4Future server component M22 1 st DLO 1b - Provide access to Trees4Future metadata through SPARQL endpoint 1c - Transfer parts of the Trees4Future ontology required by the SemaGrow infrastructure 2a - Design of additional / extended SPARQL queries on datasets Partners M22 1 st DLO, NCSR-D M22 1 st NCSR-D, DLO M24 1 st, 2 nd DLO, UAH 2b - Extending current Trees4Future ontologies M27 1 st, 2 nd DLO 2c - Development of a data conversion component M27 2 nd UAH, DLO 3a Connection to triple store for AgMIP (meta)data M27 2 nd NCSR-D, DLO 3b - Set up of an ontology to support effective data selection from AgMIP data sources M27 2 nd DLO, UNITOV 3c - Extension of current ontologies with semantics for AgMIP M27 2 nd DLO, UNITOV st Controlled Trial The first controlled pilot trial focusses on the Trees4Future user community and the evaluation criteria that are most relevant for that community. Specification of these criteria concentrates on (1) evaluating specific Trees4Future queries against metadata versus their SemaGrow demonstrator Page 47 of 61

48 analogues and (2) evaluating some basic data oriented queries against current (manual or semiautomated) procedures. The 1 st controlled pilot trial will be performed as an off-line experiment with a limited group of (3-5) Trees4Future users. It will consist of: - Evaluating a set of pre-defined queries on the Trees4Future metadata against both the Trees4Future application and the SemaGrow demonstrator. This will be a list of 5 10 predefined queries covering thematic, spatial, temporal and combined queries. - Evaluating a limited and pre-defined set of queries on the data content of Trees4Future. Since the queries will not be processed to a user format (as it is functionality planned for the 2 nd pilot deployment), limited evaluation will be performed, focussing on performance and correctness of the results. Table 5-4: SemaGrow Trees4Future-AgMIP 1st Pilot Trial Evaluation Component Objects of evaluation Evaluation methods Metadata queries Data queries 5-10 predefined queries on Trees4Future metadata 3-5 predefined queries on Trees4Future data Correctness / Completeness Objective / quantitative assessment of returned datasets against the pre-assessed expected output datasets Performance Objective / quantitative comparison of SemaGrow end user query performance compared with pre-assessed performance of Trees4Future application User experience User questionnaire, assessing general and query specific opinions regarding demonstrator functionality and behaviour Correctness / Completeness Objective / quantitative assessment of returned data against the pre-calculated expected output data User experience User questionnaire, assessing general and query specific opinions regarding demonstrator functionality and behaviour Page 48 of 61

49 nd Controlled Trial The second controlled pilot trial will include the AgMIP user community and its specific evaluation criteria and will focus on the functions and additional (AgMIP) data added to the Trees4Future-AgMIP demonstrator for the 2 nd deployment. Specification of the criteria concentrates on evaluating specific AgMIP queries on metadata and data oriented queries against current (manual or semi-automated) procedures. The 2 nd controlled pilot trial will be performed as an off-line experiment with a selected group of (5-10) users from the AgMIP and the Trees4Future community. Table 5-5: SemaGrow Trees4Future-AgMIP 2nd Pilot Trial Evaluation Component Objects of evaluation Evaluation methods Metadata queries Data queries predefined queries on both AgMIP and Trees4Future metadata 5-10 predefined queries on AgMIP and Trees4Future data Correctness / Completeness Objective / quantitative assessment of returned datasets against the pre-assessed expected output datasets Performance Quantitative and qualitative assessment of SemaGrow end user query performance Quantitative assessment by comparison of required time against time required for (semi)manual data processing. User questionnaire, assessing performance experience by AgMIP users User experience User questionnaire, assessing general and query specific opinions regarding demonstrator functionality and behaviour Correctness / Completeness Objective / quantitative assessment of returned data against the pre-calculated expected output data. Evaluation of correctness of format(s) of returned datasets by opening / processing with one or more selected tools. User experience User questionnaire, assessing general and query specific opinions regarding demonstrator functionality and behaviour Page 49 of 61

50 The offline experiment will consist of: - Evaluating a set of pre-defined queries on AgMIP and Trees4Future metadata. A list of predefined queries covering thematic, spatial, temporal and combined queries will be evaluated. The Trees4Future related queries will be evaluated against both the Trees4Future application and the SemaGrow demonstrator. AgMIP queries will be qualitatively evaluated through a user questionnaire. - Evaluating a limited pre-defined set of queries on the data content of both AgMIP and Trees4Future. The evaluation will again focus on performance and correctness of the results, but will specifically also evaluate the usability of the returned dataset by importing and testing the delivered format(s) against selected tools. 5.3 AGRIS Development plan The implementation of the AGRIS demonstrators will require the following phases: Phase 1: Development of the Web crawler environment, which includes the automatic tagging tool (AgroTagger): 1.1. Customization and deployment of a Web Crawler (e.g. Apache Nutch); 1.2. Adaptation of the AgroTagger to work with the Web Crawler output; Phase 2: Development of a LOD environment, which includes the AGRIS core database and the output of the process described in Phase 1: 2.1. Execution of the Web Crawler + AgroTagger to generate a big set of triples (the crawler database); 2.2. Storage of the AGRIS core database and the crawler database in a triplestore provided by SemaGrow (the triplestore and its physical location have still to be defined); Phase 3: Discovering meaningful combinations between the AGRIS core database and crawler database (and other SemaGrow databases, like DLO): 3.1. A domain expert will define possible combinations of databases in natural language. Currently, this domain expert has not been identified yet; 3.2. SemaGrow partners - with the coordination and direct contribution of UAH will provide processing tools, which include the translation of queries identified in point 3.1 to SPARQL or other machine languages and the generation of results; 3.3. Resulting triples will be stored in a triplestore; Phase 4: Extend the AGRIS Web portal with the output of Phase 3, in order to allow users finding relevant data to take better decisions related to the agricultural domain and food security: Page 50 of 61

51 a. A new widget, based on the output of Phase 3, will be added to the AGRIS Web portal, so that users will find meaningful and related resources to the information provided by the AGRIS Web portal. The deployment of the first version of the demonstrator and the execution of associated pilot trial will include the full implementation work of Phase 1 and Phase 2. Phase 2 requires additional work for SemaGrow partners to share a physical or virtual server to set up a triplestore and stores the output of the crawler, as well as a copy of the AGRIS core database. The second deployment and pilot trial is performed on the full demonstrator including the implementation work from Phases 3 and 4. To complete Phase 3, it is still necessary to identify a domain expert who can define useful combinations between databases; moreover, a technical work to compute combinations is needed. The time planning of the implementation of the AGRIS demonstrator and its pilot trials is given in Table 5-6. Phase / Task Table 5-6: Implementation planning for the AGRIS demonstrator Delivery Month Deploy ment 1.1 Customization and deployment of a Web Crawler M21 1 st FAO 1.2 Adaptation of the AgroTagger to work with the Web Crawler output 2.1 Execution of the Web Crawler + AgroTagger to generate a big set of triples (the crawler database) 2.2 Storage of the AGRIS core database and the crawler database in a triplestore provided by SemaGrow (the triplestore and its physical location have still to be defined) M21 1 st FAO M22 1 st FAO Partners M24 1 st, 2 nd UAH, NCSR-D, IPB (not yet completely defined) 3.1 Domain expert to define combinations M25 1 st, 2 nd UAH, NCSR-D, AK (not yet completely defined) 3.2 Technical processing to discover combinations M27 2 nd UAH, NCSR-D, FAO 3.3 Generate triples M27 2 nd UAH, NCSR-D, IPB (not yet completely defined) 4.1 Widget in the AGRIS portal M30 2 nd FAO Page 51 of 61

52 5.3.2 Evaluation plan Two aspects of the demonstrator will have to be evaluated. On the one hand, the efficiency of the data enhancement phase. On the other hand, a more user-oriented evaluation will also have to be performed, in order to assess the improved experience of the end user st Controlled Trial The first controlled pilot trial focusses on the AGRIS data technology and, in particular, on performances and usability of the Sparql endpoint that results as output of the crawler. Specification of these criteria concentrates on evaluating the speed and performances of the triplestore instance and the usability of the data contained in such a triplestore. Evaluation methods for this trial are: usability testing, observation, and experiments. This means: - Run Sparql queries against the Sparql endpoint to evaluate performances; - Run combinations of Sparql queries to combine this triplestore with the AGRIS core database and comment on performances; - Observation of the behaviour of the infrastructure in the short term period (for instance, logging eventual downtime of the system) nd Controlled Trial The front-end AGRIS demonstrator will be evaluated by end users, such as in-house agricultural information officers and external users of AGRIS. This will affect the following evaluation stakeholders: researchers, domain specialists, librarians. What needs to be evaluated in this phase is the final widget that will be available in the AGRIS portal: for each AGRIS record, a widget will show related information discovered by meaningful combinations between the AGRIS core database, the crawler database, and other SemaGrow databases like DLO. We plan on using both interviews and surveys administered on a face-to-face base, and from distance (surveys reachable from AGRIS website). Surveys and interviews will focus on user satisfaction with respect to the amount of data accessed by the demonstrators, their relevance to the user information needs, and their relevance to each AGRIS record they are interlinked. 5.4 Agricultural Discovery Space The main goal of the evaluation approach for the Agricultural Data Discovery use case will be to evaluate both the data platform layer and the front-end discovery applications. We will follow the lean methodology principles to validate the new version of the data platform and the discovery applications. The validation is focusing both on problem and solution. The main steps of this methodology are depicted in the following figure. Page 52 of 61

53 Figure 5-1 Steps of the overall validation approach More specifically, an iterative process will be followed that includes the following steps problem understanding that can be conducted using interviews with the real users solution definition that will be validated with real users e.g. a new SemaGrow powered component in a discovery app Qualitative validation of the new solution with interviews Quantitative validation using metrics that will be based on logs and analytics For the quantitative evaluation the following tools will be used Actionable metrics rather than simple metrics. These are metrics that tie specific and repeatable actions to observed results. For instance the number of queries in a discovery application that has been transformed to view of specific resources. Funnel reports e.g. to check how many visits in the APIs or discovery app are transformed to usage of the Search API. Cohort Analysis to study the long-term effects of the improvements. In the case of API usage the evolution of the metrics can be studied for the different hackathon events that will be realized in the context of SemaGrow. An example of cohort analysis diagram is presented in Figure Development plan In the case of the ADS demonstrator, there will be two separate development phases: Phase 1: Develop the enhanced data platform that will include the following components: an analytics tool in the API page to see how much the developers are interested in the API and which are the ones that are using it. It is important to understand at which point they are dropping off and to ask them why is this happening e.g. incomplete documentation, low number of parameters in the API, architecture of the implemented API a component that will wrap the existing data platform Search API so it can be included in the SemaGrow end point. This component will be developed by NCSR-D team. a component in the Search API that will identify developers e.g. se an API key to be able to track developers. a component in the Search API that will track queries and will combine them with timestamp and users Page 53 of 61

54 Figure 5-2: Example of cohort analysis for a specific period Phase 2: Develop the front end discovery applications that will include the following components: Component at the front end discovery application that will consume the new SemaGrow powered Search end point Components that will implement the new functionalities powered by the SemaGrow end point. Such functionalities will be the display of related resources at the view item page, the federated search over A component that will identify the users in the discovery application so we can track the user activities in the analysis of metrics e.g. users that performed a more complex query stayed more in the discovery application. A component that will implement the A/B testing for the discovery interface. This script will select randomly either the SemaGrow powered search api or the GLN API. The search terms, API used, time, user session should be stored at a NoSQL db e.g. MongoDB. These logs will be used to estimate the metrics. These logs will be also combined with simple metrics from Google analytics e.g. search depth. It should be pointed that the new components at the discovery application will be implemented after the validation of the problems with real users. The problem validation is planned to take place during June and July The new version of discovery applications will be iteratively developed and tested until the end of Evaluation plan Approach to evaluate the data platforms APIs As regards the data platform, the Search end point (SPARQL) powered by SemaGrow will be evaluated by developers and data scientists during events such as hackathons. Such evaluation events will be used to Identify the main problems that the developers are facing when they are using a Search API in order to build a discovery application. Evaluate the different versions of the SemaGrow powered Search end point that will be integrated in the AK s Data Platform. The main tool for the identification of the problems and the evaluation will be interviews with the real users. For the quantitative validation, the log files and analytics will be collected and analysed in Page 54 of 61

Linked Open Data and Semantic Technologies for Research in Agriculture and Forestry

Linked Open Data and Semantic Technologies for Research in Agriculture and Forestry Linked Open and Semantic Technologies for Research in Agriculture and Forestry Platform Linked Nederland 2 April 2015 Rob Lokers, Alterra, Wageningen UR Contents related challenges in agricultural (and

More information

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data Ahsan Morshed Intelligent Sensing and Systems Laboratory, CSIRO, Hobart, Australia {ahsan.morshed, ritaban.dutta}@csiro.au

More information

1. PUBLISHABLE SUMMARY

1. PUBLISHABLE SUMMARY D1.2.2. 12-Monthly Report FP7-ICT-2011.4.4 1. PUBLISHABLE SUMMARY This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration

More information

Chinese Agricultural Thesaurus and its application on data sharing & interoperability

Chinese Agricultural Thesaurus and its application on data sharing & interoperability Chinese Agricultural Thesaurus and its application on data sharing & interoperability Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I)

THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I) ENVIROfying the Future Internet THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I) INSPIRE Conference Firenze,

More information

Introduction

Introduction Introduction EuropeanaConnect All-Staff Meeting Berlin, May 10 12, 2010 Welcome to the All-Staff Meeting! Introduction This is a quite big meeting. This is the end of successful project year Project established

More information

Agricultural bibliographic data sharing & interoperability in China

Agricultural bibliographic data sharing & interoperability in China Agricultural bibliographic data sharing & interoperability in China Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network Meeting, 29 Aug.,

More information

SEXTANT 1. Purpose of the Application

SEXTANT 1. Purpose of the Application SEXTANT 1. Purpose of the Application Sextant has been used in the domains of Earth Observation and Environment by presenting its browsing and visualization capabilities using a number of link geospatial

More information

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs Linked Open Data in Agriculture MACS-G20 Workshop in Berlin, September 27th 28th, 2017 Semantic challenges in sharing dataset metadata and creating federated dataset catalogs The example of the CIARD RING

More information

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons.

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons. Integration of semantics, data and geospatial information for LTER Abstract The long term ecological monitoring and research network (LTER) in Europe[1] provides a vast amount of data with regard to drivers

More information

The European Commission s science and knowledge service. Joint Research Centre

The European Commission s science and knowledge service. Joint Research Centre The European Commission s science and knowledge service Joint Research Centre GeoDCAT-AP The story so far Andrea Perego, Antonio Rotundo, Lieven Raes GeoDCAT-AP Webinar 6 June 2018 What is GeoDCAT-AP Geospatial

More information

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA) ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA) Expert contract supporting the Study on RDF and PIDs for INSPIRE Deliverable D.EC.3.2 RDF in INSPIRE Open issues, tools, and implications

More information

D2.5 Data mediation. Project: ROADIDEA

D2.5 Data mediation. Project: ROADIDEA D2.5 Data mediation Project: ROADIDEA 215455 Document Number and Title: D2.5 Data mediation How to convert data with different formats Work-Package: WP2 Deliverable Type: Report Contractual Date of Delivery:

More information

Europeana update: aspects of the data

Europeana update: aspects of the data Europeana update: aspects of the data Robina Clayphan, Europeana Foundation European Film Gateway Workshop, 30 May 2011, Frankfurt/Main Overview The Europeana Data Model (EDM) Data enrichment activity

More information

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions (Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions HMA-AWG Meeting ESRIN (Room D) 20. May 2016 Uwe Voges (con terra GmbH) GeoDCAT-AP European Data Portal European Data Portal (EDP): central

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: The Semantic Web for the Agricultural Domain, Semantic Navigation of Food, Nutrition and Agriculture Journal Gauri Salokhe, Margherita Sini, and Johannes

More information

A distributed network of digital heritage information

A distributed network of digital heritage information A distributed network of digital heritage information SWIB17 Enno Meijers / 6 December 2017 / Hamburg Contents 1. Introduction to Dutch Digital Heritage Network 2. The current digital heritage infrastructure

More information

Europeana Core Service Platform

Europeana Core Service Platform Europeana Core Service Platform DELIVERABLE D7.1: Strategic Development Plan, Architectural Planning Revision Final Date of submission 30 October 2015 Author(s) Marcin Werla, PSNC Pavel Kats, Europeana

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

Showing it all a new interface for finding all Norwegian research output

Showing it all a new interface for finding all Norwegian research output Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 00 (2014) 000 000 www.elsevier.com/locate/procedia CRIS 2014 Showing it all a new interface for finding all Norwegian research

More information

Integration of INSPIRE & SDMX data infrastructures for the 2021 population and housing census

Integration of INSPIRE & SDMX data infrastructures for the 2021 population and housing census Integration of INSPIRE & SDMX data infrastructures for the 2021 population and housing census Nadezhda VLAHOVA, Fabian BACH, Ekkehard PETRI *, Vlado CETL, Hannes REUTER European Commission (*ekkehard.petri@ec.europa.eu

More information

INSPIRE & Environment Data in the EU

INSPIRE & Environment Data in the EU INSPIRE & Environment Data in the EU Andrea Perego Research Data infrastructures for Environmental related Societal Challenges Workshop @ pre-rda P6 Workshops, Paris 22 September 2015 INSPIRE in a nutshell

More information

The GeoPortal Cookbook Tutorial

The GeoPortal Cookbook Tutorial The GeoPortal Cookbook Tutorial Wim Hugo SAEON/ SAEOS SCOPE OF DISCUSSION Background and Additional Resources Context and Concepts The Main Components of a GeoPortal Architecture Implementation Options

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Pathways. CIARD Consultation Moscow, December 2009

Pathways. CIARD Consultation Moscow, December 2009 CIARD Consultation Moscow, December 2009 The CIARD RING a Routemap to Information Nodes and Gateways (RING) sharing information on agricultural research and innovation for development (ARD) Availability

More information

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE Project Acronym: CENDARI Project Grant No.: 284432 Theme: FP7-INFRASTRUCTURES-2011-1 Project Start Date: 01 February 2012 Project End Date: 31 January

More information

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac 14 th August 2014, IFLA2014 satellite meeting, Paris The European

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

GeoDCAT-AP Representing geographic metadata by using the "DCAT application profile for data portals in Europe"

GeoDCAT-AP Representing geographic metadata by using the DCAT application profile for data portals in Europe GeoDCAT-AP Representing geographic metadata by using the "DCAT application profile for data portals in Europe" Andrea Perego, Vlado Cetl, Anders Friis-Christensen, Michael Lutz, Lorena Hernandez Joint

More information

Striving for efficiency

Striving for efficiency Ron Dekker Director CESSDA Striving for efficiency Realise the social data part of EOSC How to Get the Maximum from Research Data Prerequisites and Outcomes University of Tartu, 29 May 2018 Trends 1.Growing

More information

How to contribute information to AGRIS

How to contribute information to AGRIS How to contribute information to AGRIS Guidelines on how to complete your registration form The dashboard includes information about you, your institution and your collection. You are welcome to provide

More information

D43.2 Service Delivery Infrastructure specifications and architecture M21

D43.2 Service Delivery Infrastructure specifications and architecture M21 Deliverable D43.2 Service Delivery Infrastructure specifications and architecture M21 D43.2 Service Delivery Infrastructure specifications and architecture M21 Document Owner: Contributors: Dissemination:

More information

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata Meeting Host Supporting Partner Meeting Sponsors Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata 105th OGC Technical Committee Palmerston North, New Zealand Dr.

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020

More information

Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible

Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible American Association of Geographers Boston, Massachusetts April, 2017 Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible Benjamin Lewis, Paolo Corti,

More information

Deliverable Initial Data Management Plan

Deliverable Initial Data Management Plan EU H2020 Research and Innovation Project HOBBIT Holistic Benchmarking of Big Linked Data Project Number: 688227 Start Date of Project: 01/12/2015 Duration: 36 months Deliverable 8.5.1 Initial Data Management

More information

Data publication and discovery with Globus

Data publication and discovery with Globus Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,

More information

Spatial Data on the Web

Spatial Data on the Web Spatial Data on the Web Tools and guidance for data providers The European Commission s science and knowledge service W3C Data on the Web Best Practices 35 W3C/OGC Spatial Data on the Web Best Practices

More information

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations Ref. Ares(2017)3291958-30/06/2017 Readiness of ICOS for Necessities of integrated Global Observations Deliverable 6.4 Initial Data Management Plan RINGO (GA no 730944) PUBLIC; R RINGO D6.5, Initial Risk

More information

On the Design and Implementation of a Generalized Process for Business Statistics

On the Design and Implementation of a Generalized Process for Business Statistics On the Design and Implementation of a Generalized Process for Business Statistics M. Bruno, D. Infante, G. Ruocco, M. Scannapieco 1. INTRODUCTION Since the second half of 2014, Istat has been involved

More information

PUBLICATION OF INSPIRE-BASED AGRICULTURAL LINKED DATA

PUBLICATION OF INSPIRE-BASED AGRICULTURAL LINKED DATA This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No 732064 This project is part of BDV PPP PUBLICATION OF INSPIRE-BASED AGRICULTURAL

More information

Multi-disciplinary Interoperability: the EuroGEOSS Operating Capacities

Multi-disciplinary Interoperability: the EuroGEOSS Operating Capacities Multi-disciplinary Interoperability: the EuroGEOSS Operating Capacities Stefano Nativi (CNR) stefano.nativi@cnr.it Opening and context for Global Dimension Stream: EuroGEOSS contribution to the Global

More information

NextData System of Systems Infrastructure (ND-SoS-Ina)

NextData System of Systems Infrastructure (ND-SoS-Ina) NextData System of Systems Infrastructure (ND-SoS-Ina) DELIVERABLE D2.3 (CINECA, CNR-IIA) - Web Portal Architecture DELIVERABLE D4.1 (CINECA, CNR-IIA) - Test Infrastructure Document identifier: D2.3 D4.1

More information

D WSMO Data Grounding Component

D WSMO Data Grounding Component Project Number: 215219 Project Acronym: SOA4All Project Title: Instrument: Thematic Priority: Service Oriented Architectures for All Integrated Project Information and Communication Technologies Activity

More information

Deliverable Final Data Management Plan

Deliverable Final Data Management Plan EU H2020 Research and Innovation Project HOBBIT Holistic Benchmarking of Big Linked Data Project Number: 688227 Start Date of Project: 01/12/2015 Duration: 36 months Deliverable 8.5.3 Final Data Management

More information

OER State of art and outlook

OER State of art and outlook OER State of art and outlook Study on the aggregation infrastructures for OERs Authors: Giannis Stoitsis, Kostas Vogias and Ilias Hatzakis Greek Research and Technology Network Document Version: 03 1 Table

More information

Demo: Linked Open Statistical Data for the Scottish Government

Demo: Linked Open Statistical Data for the Scottish Government Demo: Linked Open Statistical Data for the Scottish Government Bill Roberts 1 1 Swirrl IT Limited http://swirrl.com Abstract. This paper describes the approach taken by the Scottish Government, supported

More information

MINT METADATA INTEROPERABILITY SERVICES

MINT METADATA INTEROPERABILITY SERVICES MINT METADATA INTEROPERABILITY SERVICES DIGITAL HUMANITIES SUMMER SCHOOL LEUVEN 10/09/2014 Nikolaos Simou National Technical University of Athens What is MINT? 2 Mint is a herb having hundreds of varieties

More information

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi GRIDS INTRODUCTION TO GRID INFRASTRUCTURES Fabrizio Gagliardi Dr. Fabrizio Gagliardi is the leader of the EU DataGrid project and designated director of the proposed EGEE (Enabling Grids for E-science

More information

Global standard formats for opening NLK data. Adding to the Global Information Ecosystem

Global standard formats for opening NLK data. Adding to the Global Information Ecosystem Sam Oh Professor, Sungkyunkwan University, LIS Affiliate Professor, University of Washington ISO/IEC JTC1/SC34 Chair ISO TC46/SC9 Chair DCMI Oversight Committee Member Jinho Park Senior Researcher, National

More information

Spatial Data on the Web

Spatial Data on the Web Spatial Data on the Web Tools and guidance for data providers Clemens Portele, Andreas Zahnen, Michael Lutz, Alexander Kotsev The European Commission s science and knowledge service Joint Research Centre

More information

Linking and Finding Earth Observation (EO) Data on the Web

Linking and Finding Earth Observation (EO) Data on the Web Linking and Finding Earth Observation (EO) Data on the Web MACS-G20 Workshop: Linked Open Data in Agriculture Berlin, September 27-28, 2017 Dr. Uwe Voges u.voges@conterra.de Introduction Earth Observation

More information

Towards Linked Data and ontology development for the semantic enrichment of volunteered geo-information

Towards Linked Data and ontology development for the semantic enrichment of volunteered geo-information AGILE Link-VGI workshop, Helsinki 14 June 2016 Towards Linked Data and ontology development for the semantic enrichment of volunteered geo-information Rob Lemmens University of Twente, Faculty of Geo-Information

More information

Development of guidelines for publishing statistical data as linked open data

Development of guidelines for publishing statistical data as linked open data Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS A ND GEOSPATIAL INFORMATION IN M E M B E R S TATE S POLAND Mirosław Migacz INSPIRE Conference 2016 Barcelona,

More information

FP7-INFRASTRUCTURES Grant Agreement no Scoping Study for a pan-european Geological Data Infrastructure D 4.4

FP7-INFRASTRUCTURES Grant Agreement no Scoping Study for a pan-european Geological Data Infrastructure D 4.4 FP7-INFRASTRUCTURES-2012-1 Grant Agreement no. 312845 Scoping Study for a pan-european Geological Data Infrastructure D 4.4 Report on recommendations for implementation of the EGDI Deliverable number D4.4

More information

Nuno Freire National Library of Portugal Lisbon, Portugal

Nuno Freire National Library of Portugal Lisbon, Portugal Date submitted: 05/07/2010 UNIMARC in The European Library and related projects Nuno Freire National Library of Portugal Lisbon, Portugal E-mail: nuno.freire@bnportugal.pt Meeting: 148. UNIMARC WORLD LIBRARY

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Webinar Annotate data in the EUDAT CDI

Webinar Annotate data in the EUDAT CDI Webinar Annotate data in the EUDAT CDI Yann Le Franc - e-science Data Factory, Paris, France March 16, 2017 This work is licensed under the Creative Commons CC-BY 4.0 licence. Attribution: Y. Le Franc

More information

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal Margherita Sini 1, Gauri Salokhe 1, Christopher Pardy 1, Janice Albert 1, Johannes Keizer 1,

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2017/18 Vedran Sabol with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz December 11 th 2017 Overview What is Semantic

More information

D3.1 Validation workshops Workplan v.0

D3.1 Validation workshops Workplan v.0 D3.1 Validation workshops Workplan v.0 D3.1 Validation workshops Tasks and Steps The objectives within this deliverable are: To involve relevant stakeholders in the drafting of a certification process

More information

An aggregation system for cultural heritage content

An aggregation system for cultural heritage content An aggregation system for cultural heritage content Nasos Drosopoulos, Vassilis Tzouvaras, Nikolaos Simou, Anna Christaki, Arne Stabenau, Kostas Pardalis, Fotis Xenikoudakis, Eleni Tsalapati and Stefanos

More information

Uniform Resource Management

Uniform Resource Management IST-Africa 2008 Conference Proceedings Paul Cunningham and Miriam Cunningham (Eds) IIMC International Information Management Corporation, 2008 ISBN: 978-1-905824-07-6 Uniform Resource Management Karel

More information

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering

More information

Harvesting Open Government Data with DCAT-AP

Harvesting Open Government Data with DCAT-AP Fraunhofer FOKUS Institute for Open Communication Systems AlanDavidRobb / Pixabay (CC0) Harvesting Open Government Data with DCAT-AP Fabian Kirstein, 21. March 2018 The European Data Portal offers more

More information

Introduction to Prod-Trees

Introduction to Prod-Trees European Geosciences Union General Assembly 2014 Prod Trees Bernard Valentin Vienna Austria 29 April 2014 Outline 2 Background Prod-Trees Project RARE Project and Platform Status Future Background (ESA)

More information

LifeWatch/EnvEurope User Forum Use Case Ecology

LifeWatch/EnvEurope User Forum Use Case Ecology LifeWatch/EnvEurope User Forum Use Case Ecology User Forum Barcelona, March 2012 Michael Mirtl (EAA, Environment Agency Austria) Wouter Los (LifeWatch) Infrastructure for Biodiversity and Ecosystem Research

More information

Research Data Repository Interoperability Primer

Research Data Repository Interoperability Primer Research Data Repository Interoperability Primer The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms

More information

StatDCAT-AP. A Common Layer for the Exchange of Statistical Metadata in Open Data Portals

StatDCAT-AP. A Common Layer for the Exchange of Statistical Metadata in Open Data Portals StatDCAT-AP A Common Layer for the Exchange of Statistical Metadata in Open Data Portals Makx Dekkers, Stefanos Kotoglou, Chris Nelson, Norbert Hohn, Marco Pellegrino, Vassilios Peristeras Semstats 2016,

More information

Solution Architecture Template (SAT) Design Guidelines

Solution Architecture Template (SAT) Design Guidelines Solution Architecture Template (SAT) Design Guidelines Change control Modification Details Version 2.0.0 Alignment with EIRA v2.0.0 Version 1.0.0 Initial version ISA² Action - European Interoperability

More information

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Using Linked Data and taxonomies to create a quick-start smart thesaurus 7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher

More information

Financial Dataspaces: Challenges, Approaches and Trends

Financial Dataspaces: Challenges, Approaches and Trends Financial Dataspaces: Challenges, Approaches and Trends Finance and Economics on the Semantic Web (FEOSW), ESWC 27 th May, 2012 Seán O Riain ebusiness Copyright 2009. All rights reserved. Motivation Changing

More information

Call for Participation in AIP-6

Call for Participation in AIP-6 Call for Participation in AIP-6 GEOSS Architecture Implementation Pilot (AIP) Issue Date of CFP: 9 February 2013 Due Date for CFP Responses: 15 March 2013 Introduction GEOSS Architecture Implementation

More information

Europeana DSI 2 Access to Digital Resources of European Heritage

Europeana DSI 2 Access to Digital Resources of European Heritage Europeana DSI 2 Access to Digital Resources of European Heritage MILESTONE Revision 1.0 Date of submission 28.04.2017 Author(s) Krystian Adamski, Tarek Alkhaeir, Marcin Heliński, Aleksandra Nowak, Marcin

More information

A Dublin Core Application Profile in the Agricultural Domain

A Dublin Core Application Profile in the Agricultural Domain Proc. Int l. Conf. on Dublin Core and Metadata Applications 2001 A Dublin Core Application Profile in the Agricultural Domain DC-2001 International Conference on Dublin Core and Metadata Applications 2001

More information

INSPIRE: The ESRI Vision. Tina Hahn, GIS Consultant, ESRI(UK) Miguel Paredes, GIS Consultant, ESRI(UK)

INSPIRE: The ESRI Vision. Tina Hahn, GIS Consultant, ESRI(UK) Miguel Paredes, GIS Consultant, ESRI(UK) INSPIRE: The ESRI Vision Tina Hahn, GIS Consultant, ESRI(UK) Miguel Paredes, GIS Consultant, ESRI(UK) Overview Who are we? Introduction to ESRI Inc. and ESRI(UK) Presenters ArcGIS The ESRI Solution to

More information

Metadata Management in the FAO Statistics Division (ESS) Overview of the FAOSTAT / CountrySTAT approach by Julia Stone

Metadata Management in the FAO Statistics Division (ESS) Overview of the FAOSTAT / CountrySTAT approach by Julia Stone Metadata Management in the FAO Statistics Division (ESS) Overview of the FAOSTAT / CountrySTAT approach by Julia Stone Metadata Management in ESS 1. Introduction 2. FAOSTAT metadata collection 3. CountrySTAT

More information

Enabling Efficient Discovery of and Access to Spatial Data Services. CHARVAT, Karel, et al. Abstract

Enabling Efficient Discovery of and Access to Spatial Data Services. CHARVAT, Karel, et al. Abstract Article Enabling Efficient Discovery of and Access to Spatial Data Services CHARVAT, Karel, et al. Abstract Spatial data represent valuable information and a basis for decision making processes in society.

More information

Increasing dataset quality metadata presence: Quality focused metadata editor and catalogue queriables.

Increasing dataset quality metadata presence: Quality focused metadata editor and catalogue queriables. Increasing dataset quality metadata presence: Quality focused metadata editor and catalogue queriables. Alaitz Zabala (UAB), Joan Masó (CREAF), Lucy Bastin (ASTON), Fabrizio Papeschi (CNR), Eva Sevillano

More information

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS 1 1. USE CASES For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS Business need: Users need to be able to

More information

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data Heinrich Widmann, DKRZ Claudia Martens, DKRZ Open Science Days, Berlin, 17 October 2017 www.eudat.eu EUDAT receives funding

More information

Exploitation towards Thematic Communities, Training Framework and stakeholders involvement

Exploitation towards Thematic Communities, Training Framework and stakeholders involvement Exploitation towards Thematic Communities, Training Framework and stakeholders involvement Giorgio Saio GISIG eenvplus Workshop INSPIRE Conference, Florence (IT), 24 June 2013 Exploitation prospects eenvplus

More information

HUMBOLDT Application Scenario: Protected Areas

HUMBOLDT Application Scenario: Protected Areas CC by Erlend Schei Copyright by Kecko Copyright by Michael Bezzina CC by Gunnar Ries Copyright by Michael Bezzina Copyright by Michael Bezzina Copyright by Michael Bezzina CC by fs999 CC by Jordan Nielsen

More information

Public Project Website

Public Project Website Public Project Website Co-funded by the Horizon 2020 Framework Programme of the European Union DELIVERABLE NUMBER D4.3 DELIVERABLE TITLE RESPONSIBLE AUTHOR Public Project Website Panagiotis Zervas, Agroknow

More information

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro The RMap Project: Linking the Products of Research and Scholarly Communication 2015 04 22 Tim DiLauro Motivation Compound objects fast becoming the norm for outputs of scholarly communication.

More information

B2FIND and Metadata Quality

B2FIND and Metadata Quality B2FIND and Metadata Quality 3 rd EUDAT Conference 25 September 2014 Heinrich Widmann and B2FIND team 1 Outline B2FIND the EUDAT Metadata Service Semantic Mapping of Metadata Quality of Metadata Summary

More information

Elevating Natural History Museums Cultural Collections to the Linked Data Cloud

Elevating Natural History Museums Cultural Collections to the Linked Data Cloud Elevating Natural History Museums Cultural Collections to the Linked Data Cloud Giannis Skevakis, Konstantinos Makris, Polyxeni Arapi, and Stavros Christodoulakis Laboratory of Distributed Multimedia Information

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2018/19 with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz January 7 th 2019 Overview What is Semantic Web? Technology

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Towards Open Innovation with Open Data Service Platform

Towards Open Innovation with Open Data Service Platform Towards Open Innovation with Open Data Service Platform Marut Buranarach Data Science and Analytics Research Group National Electronics and Computer Technology Center (NECTEC), Thailand The 44 th Congress

More information

Migrating Bibliographic Datasets to the Semantic Web: the AGRIS case*

Migrating Bibliographic Datasets to the Semantic Web: the AGRIS case* Migrating Bibliographic Datasets to the Semantic Web: the AGRIS case* Editor(s): Jens Lehmann, University of Leipzig, Germany; Oscar Corcho, Universidad Politécnica de Madrid, Spain Solicited review(s):

More information

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification. Eva Klien (FHG), Christine Giger (ETHZ), Dániel Kristóf (FOMI)

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification. Eva Klien (FHG), Christine Giger (ETHZ), Dániel Kristóf (FOMI) Title: A5.2-D3 [3.0] A Lightweight Introduction to the HUMBOLDT Framework V3.0 Author(s)/Organisation(s): Daniel Fitzner (FhG), Thorsten Reitz (FhG) Working Group: Architecture Team / WP5 References: A5.2-D3

More information

Developing data catalogue extensions for metadata harvesting in GIS

Developing data catalogue extensions for metadata harvesting in GIS University of Bergen Department of Informatics Developing data catalogue extensions for metadata harvesting in GIS Author: André Mossige Long master thesis June 2018 Acknowledgements I would like to thank

More information

PRELIDA. D2.3 Deployment of the online infrastructure

PRELIDA. D2.3 Deployment of the online infrastructure Project no. 600663 PRELIDA Preserving Linked Data ICT-2011.4.3: Digital Preservation D2.3 Deployment of the online infrastructure Start Date of Project: 01 January 2013 Duration: 24 Months UNIVERSITAET

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

SLIPO. Scalable Linking and Integration of Big POI data. Giorgos Giannopoulos IMIS/Athena RC

SLIPO. Scalable Linking and Integration of Big POI data. Giorgos Giannopoulos IMIS/Athena RC SLIPO Scalable Linking and Integration of Big POI data I n f o r m a ti o n a n d N e t w o r ki n g D a y s o n H o ri z o n 2 0 2 0 B i g Da ta Public-Priva te Partnership To p i c : I C T 14 B i g D

More information

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm Advice: The time spent on pre-launch analysis is worth the effort to avoid starting from scratch and further alienating already frustrated users by implementing a search which appears to have no connection

More information