Multilingual Ontologies for Networked Knowledge D5.1.4 Use Case demonstrator platform V4 (Localised translation-candidate ranking, crosslingual knowledge presentation) Project ref. no Project acronym Start date of project (dur.) Document due Date Responsible for deliverable Reply to Document status FP7-ICT-4-248458 Monnet 01 March 2010 (36 Months) 30 June 2012 (M27) Sean O Riain, Thomas, Susan Marie, Xander Uiterlinden sean.oriain@deri.org, susan.marie.thomas@sap.com, x.uiterlinden@beinformed.com, Final D5.1.1b Use Case demonstrator platform V2 Page 1 of 12
Project reference no. Project working name Project full name Document name Security (distribution level) Contractual delivery date Access Deliverable number Deliverable name Type Version WP / Task responsible Contributors EC Project Officer FP7 ICT 4 248458 Monnet Multilingual Ontologies for Networked Knowledge Monnet D5.1.4 final PU 30 June 2012 (M27) Restricted Not a required deliverable Use Case demonstrator platform V2 (Localised translationcandidate ranking, cross lingual knowledge presentation) Demo Final WP5 (NUIG) Sean O Riain SAP, Thomas, Susan Marie; Be Informed, Xander Uiterlinden Antonio Puente Rodero Distribution List Review list Consortium Partners Consortium Partners Approved by Project Administration, Consortium Supervisory Board Document Location https://dev.deri.ie/confluence/display/monnet/deliverables D5.1.1b Use Case demonstrator platform V2 Page 2 of 12
Table of Contents 1 INTRODUCTION... 4 2 FINANCIAL, CROSS LINGUAL KNOWLEDGE BASE ACCESS DEMONSTRATOR... 4 2.1 ACCESS... 4 2.2 WALK THROUGH... 4 2.3 COMPONENT ARCHITECTURE... 6 3 PUBLIC SERVICES TRANSLATION DEMONSTRATOR... 7 3.1 ACCESS... 7 3.2 WALK THROUGH... 7 3.3 COMPONENT ARCHITECTURE... 10 4 ARCHITECTURE... 10 4.1 MONNET SOURCES... 11 D5.1.1b Use Case demonstrator platform V2 Page 3 of 12
1 INTRODUCTION This document is not a required deliverable but is intended to provide insight into the Monnet financial and public sector use case demonstrators. The financial use case demonstrates cross lingual user based querying and comparison of financial data from financial reports from multiple European jurisdictions. For example, the comparison of financial instruments from the Spanish and Belgian annual business filings. The cross lingual querying capability is enabled by work performed by the XBRL Europe Working Group, called the XBRL Europe Business Registers Working Group (xebr). The public sector use demonstrator the Monnet translation services through their integration with into the Be Informed Studio. The integration allows data modellers and translators that work with multilingual public sector ontologies to effectively produce high quality translations. 2 FINANCIAL, CROSS LINGUAL KNOWLEDGE BASE ACCESS DEMONSTRATOR 2.1 Access The financial domain use case demonstrator from SAP will be made available at the Monnet demo web site for viewing and interaction. 2.2 Walk Through The xebr group has created an XBRL taxonomy which includes concepts which are commonly found in financial statements in Europe. This is called the Core Reference Taxonomy, or the Core Taxonomy for short. The group has also created mappings from existing XBRL taxonomies to the Core Taxonomy. As of April 2012 mappings for the following countries have been created: Belgium, Italy, France, Spain, Germany, and Denmark. There is also a mapping from the IFRS taxonomy. Each mapped concept is either the same (exact match), broader than, or narrower than the core concept. The taxonomy and the mappings are represented in an excel file. Figure 1 shows extracts from the excel sheet representing the xebr taxonomy hierarchy, plus an extract from the sheet of mappings from Belgian taxonomy concepts to xebr concepts. D5.1.1b Use Case demonstrator platform V2 Page 4 of 12
Uses xebr taxonomy and mappings to xebr taxonomy from country taxonomies. xebr Taxonomy Map from Belgian to xebr Figure 1 Extract of xebr taxonomy and mappings to it In the demonstrator a query builder requests the user to choose the reports, the dates, and the core concepts of interest. The choice is done in that order, with each choice being used to fill the selection options for the next choice. Each selection is made using a twin column selector as shown in Figure 2. After making the choices, the user clicks submit, and a query is issued to get the data and display it in a table. A sample query and result are shown in Figure 2. The mapping information created by the xebr group is used to get values for those concepts in the reports which have been mapped to the xebr concepts the user selected. Thus, the xebr concepts become a means to compare data from all the selected reports, even though they are from different countries or jurisdictions, and were created using the language of the country. Figure 2 Screenshot of initial demonstrator using xebr taxonomy and mappings D5.1.1b Use Case demonstrator platform V2 Page 5 of 12
To view the original language labels for the concepts, the user can click on a line in the result table. This causes a list of labels for the selected concept to be displayed along with the match type and the language. This is also shown in the figure at the bottom right. Future demonstrators will be developed to incorporate: Integration with an enhanced knowledge base populated through automated information extraction. Translation and ontology matching components developed within the Monnet project. The goal is to build an integrated demonstrator that addresses the common requirements identified in WP1 for the xebr, CoREP and FINREP use cases. 2.3 Component Architecture The demonstrator takes advantage of the XBRL2RDF converter developed by Monnet. This is used offline to convert taxonomies and instance documents from XBRL to RDF. These are then loaded into an RDF store, in our case Virtuoso. The mappings between local taxonomies and xebr were also converted to RDF and uploaded to the RDF store. Each taxonomy and XBRL instance document is uploaded as a separate RDF graph, so that it is easy to add and remove them from the RDF store. Having separate graphs also makes it possible to limit the data query to the graphs of interest, as opposed to querying the entire store. The mappings are also in a separate graph. The demonstrator is built using the Vaadin Framework, which enables development of web applications using pure Java. The framework is based on data containers, which are bound to UI elements. In the demonstrator the data containers are filled via SPARQL queries issued to a Virtuosos triple store, which contains the xebr taxonomy (version 7), other taxonomies, instances of the other taxonomies, and the mappings from those taxonomies to the xebr taxonomy. Figure 3 illustrates the web application architecture. Web application UI & Logic Data Containers SPARQL queries RDF Store Figure 3 Architecture of demonstrator As outlined, the mapping information, uploaded to the knowledge base (RDF store) as a separate graph, is used to retrieve values for those concepts in the reports which have been mapped to the xebr concepts selected by the user. The xebr concepts then become a means to compare data from all the selected reports, even though they are from different jurisdictions, and languages. D5.1.1b Use Case demonstrator platform V2 Page 6 of 12
3 PUBLIC SERVICES TRANSLATION DEMONSTRATOR 3.1 Access The public service used in the use case demonstrator will be made available from the Monnet demo web site for viewing and interaction by Be Informed. 3.2 Walk Through Figure 4 below shows the Be Informed studio with which has a fragment of the housing benefit ontology loaded in the viewing area. This is the environment where analysts and modellers create and maintain ontologies. Every ontology fragment has a source language in which it is created and all its labels are formulated in. Figure 4 Be Informed studio To be able to create multilingual ontologies, language profiles can be created by translators. These profiles contain translations of all labels into a given target language. The Language Profile Editor (Figure 5) has been extended to offer users Monnet translation suggestions D5.1.1b Use Case demonstrator platform V2 Page 7 of 12
Figure 5 Language Profile Editor The blue circle shows the number of available translation suggestions, as specified in 3.2.2 of D1.2.2. The accompanying circle shows the status of the translation as indicated by its colour coding. Green indicates a translation that has been accepted, in this session or an earlier one, orange indicates an available but not yet accepted translation, and white the absence of any translation. Progress on profile completeness is indicated by the percentage completion bar. The colors in the completion bar match those of the circle indicators. By clicking the blue suggestion button, or typing Control Space in the text field, the suggestion dialog for that translation is triggered. It retrieves suggestions, potentially from multiple sources. The main source of suggestions is Monnet s translation service. As Monnet leverages ontology structure in translation (See section 3.1 of D1.2.2), the model is sent along with the translation request in the form of an OWL ontology, as specified by the Interface Specification. Since the Monnet translation service is an on line service a progress dialog is presented to the user while translations suggestions are being retrieved from the different translation services. Translation suggestions are presented in a popup, along with source of the suggestion and confidence score that was assigned to it by the suggestion provider, as described in Section 3.2.5 of D1.2.2. See the backlog for more information on further development of provenance related functionality. See Section 3.2.4 of D1.2.2 for more requirements on provenance data in the context of providing ontology translations. D5.1.1b Use Case demonstrator platform V2 Page 8 of 12
Figure 6 Suggestion Selection On loading the suggestions from Monnet for a specific Ontology, the modeller can choose to automatically accept suggestions with high confidence. The suggestions that qualify for auto acceptance appear automatically in the target language column of the form, to further improve the productivity of the translator. The editor preferences form (Figure 7) allows behaviour specification through parameter setting of: The minimum confidence score a suggestion for auto acceptance availability; The minimum difference between the confidence scores of the first and additional suggestions for the first suggestion to be auto acceptance eligible. As an optimization the user can also choose whether translations should be retrieved from the translation service for labels that already have an accepted translation. For evaluation purposes the user can choose a folder where statistics about the translation session should be stored. Figure 7 Editor Preferences Form D5.1.1b Use Case demonstrator platform V2 Page 9 of 12
3.3 Component Architecture Language Profile Editor Retrieves Suggestions Model Translator Interface translate(model) Logs Suggestion Adoption Usage Monitoring Framework Usage Log Implements Local Translations Adapter Naive Bing Translation Adapter Monnet Webservice Translation Adapter Be Informed Model to OWL Converter Online Access T Monnet Webservice 4 ARCHITECTURE Monnet components are defined in the component catalogue found at https://dev.deri.ie/confluence/display/monnet/component+catalogue and services including resource services at https://dev.deri.ie/confluence/display/monnet/service+catalogue The translation candidate generator forms a part of the Monnet Ontology Localisation Service outlined in Figure 6. Resources required for translation such as online translators (e.g. Bing, FreeTranslation) and domain resources (term bases, corpus) are registered in the component catalogue. Resources may also be provided as services. D5.1.1b Use Case demonstrator platform V2 Page 10 of 12
Figure 6 Monnet Components Model 4.1 Monnet Sources Monnet subversion repositories for the various WP s are defined under https://dev.deri.ie/confluence/display/monnet/mailing+lists D5.1.1b Use Case demonstrator platform V2 Page 11 of 12
For architecture, component models and OSGi activity refer to: https://dev.deri.ie/confluence/display/monnet/wp+5 https://dev.deri.ie/confluence/display/monnet/architecture https://dev.deri.ie/confluence/display/monnet/architecture+overview D5.1.1b Use Case demonstrator platform V2 Page 12 of 12