Platform UI Specification (26)

Size: px

Start display at page:

Download "Platform UI Specification (26)"

Naomi Sherman
5 years ago
Views:

1 Platform UI Specification (26) December 20, 2017 Deliverable Code: D6.6 Version: 1.0 Final Dissemination level: Public This report presents the OpenMinTeD platform user interface design and implementation issues and decisions. It offers a brief overview of the current state of the art in other related systems, a brief description of the data model and the actors involved and a detailed description of the User Interface functionalities as it has been designed for the first release of the OpenMinTeD platform. H2020-EINFRA / H2020-EINFRA Topic: EINFRA Managing, preserving and computing with big research data Research & Innovation action Grant Agreement

2 Document Description D6.6 WP6 Platform Design and Implementation WP participating organizations: ARC, University of Manchester, UKP-TUDA, INRA, OU, CNIO, USFD, GRNET Contractual Delivery Date: 07/2017 Actual Delivery Date: 01/2018 Nature: Report Public Deliverable Version: 1.0 Final Preparation slip Name Organization Date From Konstantinos Koumantaros Katerina Gkirtzou, Stefania Martziou Edited by Stefania Martziou, Antonis Lempesis Reviewed by Marta Villegas Andrea Zielinski GRNET ARC ARC 20/12/2017 ARC 21/12/2017 BSC GESIS 08/01/ /01/2018 Approved by Androniki Pavlidou ARC 04/02/2018 For delivery Mike Hatzopoulos ARC 04/02/2018 Document change record Issue Item Reason for Change Author Organization V0.1 Draft version Initial version sent for comments Konstantinos Koumantaros, Katerina Gkirtzou, Stefania Martziou V1.0 First version Incorporating reviewers Stefania Martziou ARC GRNET ARC ARC Public Page 1 of

3 comments Public Page 2 of

4 Table of Contents 1. INTRODUCTION DESIGN METHODOLOGY USER REQUIREMENTS STUDY OF THE STATE-OF-THE-ART LOW FIDELITY PROTOTYPES HIGH FIDELITY PROTOTYPES STATE OF THE ART META-SHARE CLARIN OPENAIRE GALAXY ARGO WEBANNO OVERVIEW OF THE DATA MODEL ACTORS END USERS ADMINISTRATORS AND MODERATORS FUNCTIONALITIES AUTHENTICATING WITH OPENMINTED AAI BROWSE AND SEARCH RESOURCES IN THE OMTD REGISTRY RESOURCE REGISTRATION APPLICATION EXECUTION USER SPACE WORKFLOW EDITOR ANNOTATION EDITOR IMPLEMENTATION TECHNOLOGIES 114 REFERENCES Public Page 3 of

5 Table of Figures Figure 1 METASHARE Homepage 14 Figure 2 METASHARE - Results page 15 Figure 3 METASHARE - Media type 16 Figure 4 METASHARE Landing page 17 Figure 5 CLARIN - Homepage 19 Figure 6 CLARIN - Results page 20 Figure 7 CLARIN - Media type 20 Figure 8 CLARIN - Landing page 21 Figure 9 OpenAIRE Publications 23 Figure 10 OpenAIRE - Results page 24 Figure 11 OpenAIRE - Publication 25 Figure 12 OpenAIRE - Data providers 26 Figure 13 OpenAIRE - General search 27 Figure 14 GALAXY Analysis Workspace 29 Figure 15 GALAXY Workflow Editor 30 Figure 16 GALAXY Public Repository 31 Figure 17 GALAXY Published Workflow 32 Figure 18 ARGO Homepage 33 Figure 19 ARGO - Workflow Editor 34 Figure 20 ARGO Complete Workflow 35 Figure 21 ARGO Setting Component Parameter 36 Figure 22 ARGO Selecting Workflow 37 Figure 23 ARGO - Processing a Workflow 37 Figure 24 ARGO - Results 37 Figure 25 WebAnno - Homepage 40 Figure 26 WebAnno - Annotation 41 Figure 27 WebAnno - Curation 42 Figure 28 OpenMinTeD Schema overview 44 Figure 29 OpenMinTeD Corpus Schema 46 Figure 30 Prototype for Login Page 49 Figure 31 Prototype for selecting institute 50 Figure 32 Prototype for Attribute Release Consent Form 50 Figure 33 Prototype for browse resources 52 Figure 34 Prototype for browse resources - filtered by resource type = application 53 Figure 35 Prototype for search - searching for an extractor in the OpenMinTeD applications 55 Figure 36 Prototype for a component s landing page 57 Figure 37 Prototype for an application's landing page 58 Figure 38 Prototype for a corpus' landing page 59 Figure 39 Prototype for resource registration using a form - component example 61 Figure 40 Prototype for resource registration using an XML - component example 63 Figure 41 Prototype for registering UIMA / GATE components 65 Figure 42 Prototype for registering dockerized components 66 Figure 43 Prototype for registering web services 67 Figure 44 Prototype for uploading a corpus 69 Figure 45 Prototype for uploading a corpus - upload zip and fill metadata 70 Figure 46 Prototype for building a corpus 72 Figure 47 Prototype for building a corpus - search for publications 73 Figure 48 Prototype for building a corpus - filter publications' selection 74 Figure 49 Prototype for building a corpus - edit auto-generated metadata for the corpus 75 Figure 50 Prototype for building a corpus - monitor building process 76 Figure 51 Prototype for building a corpus - process ended successfully 76 Figure 52 Prototype for registering an existing application 78 Figure 53 Prototype for building a new workflow (application) 80 Figure 54 Prototype for executing an application 82 Figure 55 Prototype for executing an application - browse for input corpus 83 Public Page 4 of

6 Figure 56 Prototype for executing an application - input corpus selected 84 Figure 57 Prototype for executing an application - browse for application 85 Figure 58 Prototype for executing an application - application selected 86 Figure 59 Prototype for executing an application - process running 86 Figure 60 Prototype for executing an application - process completed 87 Figure 61 Prototype for user's applications page 89 Figure 62 Prototype for user's corpora page 92 Figure 63 Prototype for user's corpora page - make corpus public functionality 92 Figure 64 Prototype for user's corpora page - corpus made public successfully 92 Figure 65 Prototype for user's corpora - deleting a private corpus functionality 93 Figure 66 Prototype for user's components page 94 Figure 67 High-fidelity prototype of the Workflow Editor when creating a new workflow 98 Figure 68 High-fidelity prototype of the Workflow Editor when editing a workflow 100 Figure 69 High-fidelity prototype of the Workflow Editor when saving a workflow 102 Figure 70 High-fidelity prototype of Editing Annotation Project s details and Deleting functionalities 106 Figure 71 High-fidelity prototype of Manual Annotation functionality 107 Figure 72 High-fidelity prototype of Manual Correction functionality 109 Figure 73 High-fidelity prototype of Manual Annotation functionality 111 Figure 74 High-fidelity prototype of Annotation Project Monitoring functionality 113 Public Page 5 of

7 Table of Tables Table 1 User authentication functionality...48 Table 1 Browse resources functionality...51 Table 2 Search for resources...54 Table 3 Resource s landing page functionality...56 Table 4 Resource registration functionality via registration form...60 Table 5 Resource registration functionality via XML description...62 Table 6 Component registration functionality via maven coordinates...64 Table 7 Corpus upload functionality...68 Table 8 Corpus builder functionality...71 Table 9 Build workflow functionality...78 Table 10 Executing an application functionality...81 Table 11 Manage user applications functionality...88 Table 12 Manage user corpora functionality...89 Table 13 Manage user components functionality...93 Table 14 Manage user operations functionality...94 Table 15 Create Workflow functionality...96 Table 16 Browse for components in workflow editor functionality...96 Table 17 Search for components in workflow editor functionality...97 Table 18 Edit workflow functionality...98 Table 19 Set parameters of a component in workflow functionality...99 Table 20 Save a workflow functionality Table 21 Create Annotation Project functionality Table 22 Edit Annotation Editor s Project details functionality Table 23 Delete Annotation Project functionality Table 24 Save a curated corpus to OMTD Registry functionality Table 25 Manual Annotation functionality Table 26 Manual Correction functionality Table 27 Manual Curation functionality Table 28 Monitoring functionality Public Page 6 of

8 Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval. In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately. The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content. This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union. The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. ( OpenMinTeD is a project funded by the European Union (Grant Agreement No ). Public Page 7 of

9 Acronyms OMTD NLP LR UI AAI OpenMinTeD Natural Language Process Language Resources User Interface Authentication Authorization Infrastructure Public Page 8 of

10 Publishable Summary The goal of the Platform UI Specification report is to present the main functionalities that the user interface must provide in order to fulfill the OpenMinTeD functional specifications as these have been reported in the D4.3 OpenMinTeD Functional Specifications deliverable. Deliverable 4.3 describes the functionalities that will be present in the first release of the system. Functionalities that will be included in the next releases will be described in the next editions of this deliverable. Public Page 9 of

11 1. Introduction The OMTD portal will be the entry point for all services in the OpenMinTeD platform and will be used as a gateway to a large collection of numerous types of resources, from tagsets to ontologies and from publications to corpora, as well as to all user-level services and value-added functionalities, functionalities that will facilitate the end users (section 5) in the text mining process of the language data of interest. Based on the OMTD-Share model (section 4) the portal will support search and faceted browsing services for the underlying content. For each entity of the model the OpenMinTeD portal will provide detailed information and possible actions within the platform, e.g., calling the Workflow Editor in order to use a component within a mining process. This document focuses on the functionality and design of the User Interface, based on the current status of the D4.3 OpenMinTeD Functional Specifications. Taking into account both the complexity of the analyzed use cases and the underlying data model, the user interface design is a complex process that will continue to be renewed throughout the whole project with iterative cycles of design, implementation and evaluation. The evaluation of the design will be performed in collaboration with the end users, which will lead to an interface heavily influenced by the users input that will be more friendly and catering the users need. The objective is to create an intuitive and effective interface for the OpenMinTeD portal so that the end users will be able to explore and retrieve all the necessary resources, as well as all the extra added-value functionalities, such as the annotation editor or the workflow editor. The rest of the document is structured as follows: Section 2 offers an overview of the design methodology used for the design of the OpenMinTeD user interface. Section 3 offers a brief overview of systems with related functionality to the OpenMinTeD project in order to examine the design and implementation options that could be re-used within the OpenMinTeD portal. Section 4 offers a brief overview of the OMTD-Share model, while Section 5 presents the OpenMinTeD Actors. Section 6 focuses on the functionalities of the portal, accompanied by screenshots taken at the time. Finally, Section 7 presents the implementation technologies used for the OpenMinTeD portal so far. Public Page 10 of

12 2. Design Methodology The OpenMinTeD platform aims to offer to end users a unified access to a wide range of language data such as corpora, publications, lexicons and ontologies that derive from several data providers as well as software components in order to perform simple and more complex text and data mining tasks, such as entity recognition and text classification. Furthermore, the OpenMinTeD platform offers addedvalue services, including an annotation editor with support for crowdsourcing and workflow editor, to further assist researchers using text-mining tools. This work will proceed in 3 main releases, each one building upon the previous with updates, in order to improve the offered services and usability, as well as adding new functionalities. The current document will be updated accordingly as the design of each release is concluded. At this final release, we present the OpenMinTeD platform as a whole, starting with the OpenMinTeD registry at the center and its interactions with all the other services (e.g. workflow executor, workflow editor, annotation viewer, etc.) to provide all the functionalities required by its users. The user interface of the OpenMinTeD platform has been implemented using an iterative process. In the first step of this procedure we are working with the user requirements and then we proceed with studying the similar, existing systems, producing low and high fidelity prototypes and then we implement the chosen features. This cycle has been repeated throughout the whole duration of the project, more than once for each release of the platform. In this version of the deliverable and because most of the user requirements have already been fulfilled, we will not be presenting the low and high fidelity prototypes; instead we present the screenshots of the latest implementation. 2.1 User requirements. The first step in the design of the OpenMinTeD user interface is to examine the user requirements that have not been implemented yet. Each requirement is evaluated, both in terms of feasibility (whether the underlying services can support this feature) and its priority is determined. After the evaluation, a list of requirements/features is selected and the development focuses on them. 2.2 Study of the state-of-the-art The second step in the design of the OpenMinTeD platform is the study of existing systems with similar functionalities. This revealed effective approaches that have been adopted by some systems, as well as weaknesses that OpenMinTeD should overcome in its proposed design (section 3). 2.3 Low fidelity prototypes The third step is the design of low fidelity prototypes for the OMTD platform release due in M16. The designs have been drawn with the collaboration and comments of the end users, both the application domain as well as the text-mining ones (for more details see Section 5.1.2). Note that the term low Public Page 11 of

13 fidelity is used in design to denote a rough sketch or a quick mock-up (using the Pencil tool 1 ), which has little detail and is quick to produce. These low fidelity designs help the project team to collaborate more efficiently and effectively since they are more abstract, using rectangles and labeling to represent content. Dummy content, sample or symbolic content is used to represent data when real content is not available. 2.4 High fidelity prototypes The final step is the design of high fidelity prototypes for the OMTD platform. While the low fidelity prototypes have been used as the tool for several discussions with the users in order to proceed into a more detailed high fidelity design, the high-fidelity designs are necessary for the implementation. High-fidelity designs incorporate a level of detail that more closely matches the design of the actual system, including visual elements and functionality. 1 Public Page 12 of

14 3. State of the art While OpenMinTeD does not aim to replicate, or replace the functionality of other systems in the same domain, but instead aspires to complement, use them and interoperate with them, some overlap with existing projects is unavoidable. For this reason, we have studied similar in scope projects and catalogued their features and technologies used. This helped identify use cases or potential user requirements that the OMTD portal should support. 3.1 META-SHARE The community of META-NET (Network of Excellence forging the Multilingual Europe Technology Alliance) aims to bring together information society stakeholders, such as researchers, commercial technology providers, private and corporate language technology users and language professionals, and envisions to create a single digital market and information space for the Language Technology domain. Towards this direction META-NET has created META-SHARE 2, an open, distributed, secure, and interoperable web infrastructure for sharing and accessing Language Resources (LR) Features Functionality The main language resources available in the META-SHARE infrastructure, in order of priority, are: Language data, such as written and spoken corpora Language-related data, including and/or associated to other media and modalities Language processing and annotation tools and technologies Services through the use of language processing tools and technologies Evaluation tools, metrics and protocols, services addressing assessment and evaluation, and Service workflows by combining and orchestrating interoperable services For each language resource META-SHARE offers quality documentation and related metadata over the whole network and ensures that each LR as well as its respective metadata is properly managed, preserved and maintained. In order to maximize the interoperability and reuse of LRs, META-SHARE promotes the use of widely acceptable standards for language resource building, trying to overcome format, terminological and semantic differences. These interoperability guidelines along with the appropriate metadata guarantee legally sound governance, legal compliance and secure access to licensable resources. META-SHARE offers to the user the possibility to search and browse the repositories for LRs using simple keywords and/or a faceted search mechanism, to view detailed information for each LR and to 2 Public Page 13 of

download it, when legal guarantees allow it. Furthermore, META-SHARE also offers general statistics about the use of the META-SHARE node and access to the community forum.

15 download it, when legal guarantees allow it. Furthermore, META-SHARE also offers general statistics about the use of the META-SHARE node and access to the community forum. Finally, extra functionality such as uploading/downloading a LR is available for registered users only User Interface The user interface of META-SHARE is simple and clear offering access to the META-SHARE infrastructure. It aggregates information via data harvesting and synchronization and forms a central inventory, which includes metadata-based descriptions of all language resources available in the distributed network. In the META-SHARE homepage (see Figure 1 METASHARE Homepage) the user can access the catalogue of LRs via both search and browse functionalities, as well as the community forum, extensive documentation, information about the project and general statistics of the LRs usage. Furthermore, the user can use the registration, authentication and authorization service, when needed. Figure 1 METASHARE Homepage The search functionality is a simple keyword-based search, which is performed over a subset of the metadata used to describe the LRs. The result page (see Figure 2 METASHARE - Results page) lists all the LRs that match the query (if the user doesn t type any word, the result consists of the entire META- SHARE catalogue). For each LR, a selected metadata information is provided: resource name, resource short name (if available), resource type, media type (see Figure 3 METASHARE - Media type) and language (if available), as well as the number of downloads and the number of views. Public Page 14 of

On the left pane of the results page, there is a list of facets (or filters). The user can filter the results by any combination of the provided fields.

16 On the left pane of the results page, there is a list of facets (or filters). The user can filter the results by any combination of the provided fields. Filters can be combined with search terms entered in the search box to further refine the search query. The number of available LRs, if a specified filter is selected, is reported alongside each group of LRs. The facet functionality is also available in the browsing service of the META-SHARE catalogue. Figure 2 METASHARE - Results page Public Page 15 of

17 Figure 3 METASHARE - Media type Users can click on the name of a LR from the result page obtained by any type of search to open the page with the details for that LR (see Figure 4 METASHARE Landing page). In the top pane META- SHARE provides the vital information about the resource, such as its name, and its description. If a URL for the resource is available, it is also shown here. The bottom left pane provides the legal and contact information. The bottom middle pane provides the media information and when multiple instances of these types exist, they are presented in sub-tabs. The bottom right pane provides metadata creation information. Public Page 16 of

Figure 4 METASHARE Landing page 3.1.3 Technologies used The META-SHARE infrastructure is implemented using the Django Python Web framework over an Apache Tomcat Server or a Lighttpd web server.

18 Figure 4 METASHARE Landing page Technologies used The META-SHARE infrastructure is implemented using the Django Python Web framework over an Apache Tomcat Server or a Lighttpd web server. Django has an incorporated template engine that processes HTML, CSS and JavaScript in order to form the interface. The back-end of the META-SHARE infrastructure is built over a PostgreSQL database mainly, but support for MySQL and SQLite databases is also offered. Over the database, an Apache Solr system was added to provide distributed indexing, replication and load-balanced querying. Finally, the XSD schema was implemented using the Altova XMLSpy editor Conclusions META-SHARE provides the Language Technology community with an interoperable space for language resources. It offers a central inventory, which includes extensive metadata descriptions of all language resources available in the distributed network, with an easy to use interface. This facilitates research and creates a favorable operational environment for knowledge discovery and reuse. 3.2 CLARIN CLARIN 3 (Common Language Resources and Technology Infrastructure) aims at providing easy and sustainable access to digital language data in written, spoken or multimodal form, as well as advanced tools to discover, explore, annotate and analyze such data sets for the scholars in the social sciences and humanities. CLARIN consists of networked distributed repositories, service centers and knowledge centers with single sign-on access for all members of the academic community in all participating 3 Public Page 17 of

19 countries. Tools and data from different centers are interoperable, so that data collections can be combined and tools can be chained to perform complex operations to support researchers in their work Features Functionality CLARIN infrastructure stores a variety of resources, which in order of priority are: Primary and processed language data in various forms (written or spoken or multimodal form) Various types of structured language data (e.g. word lists, dictionaries, etc.) which can be used for enhanced organization, processing and study of primary language data Tools and applications for language processing (e.g. multilingual text alignment, morphological annotations, etc.) and Visualization tools for processing results, multimedia collections, etc. The infrastructure is organized as a distributed repository network with a central aggregator and local repositories. The local repositories store the catalog and the metadata descriptions of the resources and the resources themselves, while the central aggregator operates as the network coordinator and collects the metadata descriptions of all resources from the local repositories to form the central catalog. CLARIN offers the user the possibility to search and browse the repositories for language and technology resources using simple keywords and/or a faceted search mechanism. For each resource, detailed information is provided, as well as access for download when legal guarantees are met. Furthermore, registered users are provided with extra functionality such as uploading/downloading a resource or using a language tool/application via CLARIN. Finally, general statistics of the usage of the CLARIN s resources are also provided User Interface CLARIN s interface offers an easy and well-defined access to its infrastructure. CLARIN s homepage (see Figure 5 CLARIN - Homepage) allows the user access to the central catalogue of resources via a twofold functionality, browsing and searching, as well as information about the project, manual pages and general statistics of the resources usage. Furthermore, the user can use the registration, authentication and authorization service, when needed to obtain access to extra functionalities. Public Page 18 of

Figure 5 CLARIN - Homepage CLARIN s search functionality is based on simple keywords, which are matched over a subset of the metadata used to describe the resources.

20 Figure 5 CLARIN - Homepage CLARIN s search functionality is based on simple keywords, which are matched over a subset of the metadata used to describe the resources. The results page (see Figure 6 CLARIN - Results page) lists all the resources that matches the query (if the user doesn t type any word, the result consists of the entire CLARIN catalogue), providing basic metadata information such as the resource name, its type, its media type (see Figure 7 CLARIN - Media type) and language (if available), as well as the number of downloads and the number of views. Apart from the plain keyword search, CLARIN also offers faceted search capabilities (see the left pane of the results page). The user can select among the provided fields and filter the results accordingly. The faceted search can be combined with keyword search allowing a deeper refine service of the search results. Finally, faceted capabilities are offered for the browsing service of the catalogue as well. Public Page 19 of

Figure 6 CLARIN - Results page Figure 7 CLARIN - Media type When a user clicks on the name of a resource from the results page, she/he can obtain access to its detailed metadata information (see

21 Figure 6 CLARIN - Results page Figure 7 CLARIN - Media type When a user clicks on the name of a resource from the results page, she/he can obtain access to its detailed metadata information (see Figure 8 CLARIN - Landing page). In the top pane CLARIN provides vital information about the resource, such as its name, its description and access to the resource itself. If the resource is a tool or an application that also runs in the CLARIN infrastructure, then the button Upload and Process is also shown. At the bottom on the left pane, legal and contact information is Public Page 20 of

shown, on the middle pane media information is provided and finally at the right pane metadata creation information is shown. Figure 8 CLARIN - Landing page 3.2.

22 shown, on the middle pane media information is provided and finally at the right pane metadata creation information is shown. Figure 8 CLARIN - Landing page Technologies used The CLARIN s metadata schema was designed using the Altova XMLSpy editor. The main infrastructure is implemented using the Django Python Web framework, which has by default a template engine to generate and process files in html with CSS and JavaScript, over an Apache Tomcat Server. The backend storage of the infrastructure is built over a PostgreSQL database with the assistance of Apache Solr for indexing and querying optimization. CLARIN is a distributed service and runs over a cloud service. For example, CLARIN-EL is built over Okeanos, the GRNET cloud service, using the offered services of Pithos for storage and Kamaki for management of the cloud deployment. Moreover, in order to deploy the web services available in CLARIN s repository the Hadoop framework has been used for the distributed processing of very large data sets, the Ansible Scripts for the software deployment and the Celery framework for an asynchronous task queue/job queue management Conclusions CLARIN provides the scholar user with a plain and intuitive interface for exploring resources from the language technology domain. It relieves him or her from the burden of searching multiple repositories and possible incompatibilities issues, as CLARIN offers a central catalogue for searching where information from all local repositories has been harvested. Furthermore, any incompatibility issues between resources from different local repositories are diminished due to extensive interoperability guidelines and metadata. Finally, CLARIN also offers the possibility to run language technology tools and applications in its infrastructure, further facilitating the research process. Public Page 21 of

23 3.3 OpenAIRE OpenAIRE 4 (Open Access Infrastructure for Research in Europe) brings together professionals from research libraries, open scholarship organizations, national e-infrastructures, data experts, IT and legal researchers, and aims to promote open scholarship and improve the discoverability and reusability of research publications and data. OpenAIRE consists of an e-infrastructure of repository networks that offers support for accessing and managing aggregated research publications, while linking them to the accompanying research and project information, related datasets and authors. Moreover, OpenAIRE also plans to provide workflows and services on top of its repository content, which will enable an interoperable network of repositories (via the adoption of common guidelines) and easy upload into an all-purpose repository Features Functionality OpenAIRE establishes and operates an electronic infrastructure for handling peer-reviewed articles as well as other important forms of publications (pre-prints or conference publications) and enriching them with information such as funding agent, authors and organizations. OpenAIRE s portal is the gateway to all user-level services offered by the e-infrastructure established, including access via search and browse functionality to scientific publications and other value-added functionality, such as post authoring tools, monitoring tools through analysis of document, and usage of statistics. OpenAIRE offers to its users three different search services all of which use simple keywords search mechanism or a facet browse functionality. The first service is the one that allows the user to explore the OpenAIRE repository for publications, as well as research data, projects, people, organizations and data providers. The second service is dedicated for the exploration of the aggregated information available for the data providers, while the third service allows the exploration of information in numerous satellite source elements, such as newsletters and news feeds User Interface The three different search services offered by OpenAIRE have their own interface designed to maximize the access to the relevant content. The first service (see Figure 9 OpenAIRE Publications) offers the user access to the main content of OpenAIRE, i.e. publications, research data, projects, people, organizations and data providers, via browsing or simple keyword search functionality. Each type of content has its own tab where the relevant filters for the facet browsing are available. A combination of multiple filters is allowed. 4 Public Page 22 of

Figure 9 OpenAIRE Publications OpenAIRE s search functionality is based on simple keywords, which are matched over a subset of the metadata used to describe the resources.

24 Figure 9 OpenAIRE Publications OpenAIRE s search functionality is based on simple keywords, which are matched over a subset of the metadata used to describe the resources. The result page (see Figure 10 OpenAIRE - Results page) lists all the resources that match the query, but organized in a tabbed format, similar to the facet browse functionality. For each result, basic metadata information is provided. For example in the case of publications, OpenAIRE shows its title, the authors, the year of publication, the first lines of its abstract (if available) and whether the publication is open access or not. Public Page 23 of

25 Figure 10 OpenAIRE - Results page When a user clicks on the title of a publication from the results page (the same is true for all different type of contents), he or she can obtain access to its detailed metadata information (see Figure 11 OpenAIRE - Publication). In the top left of the page OpenAIRE provides vital information about the resource, which in the case of a publication is its title, the publisher s name, the language of the text, information about the type of publication, the related subjects and the abstract (if available). Moreover, OpenAIRE offers extra functionality to manually link a resource to relative information, such as a project or research data in the case of publication, but this service is only available for registered users. On the top right corner of the page extra information and functionality is provided, for example in the publication case, there are links to the repository storing the publication, information about the funded project (when known) as well as functionalities to export the citation in various formats. At the bottom, in tabbed format, OpenAIRE provides information about its related entities. For the publication these are its references, possible related research data and similar publications. Public Page 24 of

26 Figure 11 OpenAIRE - Publication The second service the OpenAIRE offers is a dedicated search/browse for the compatible data providers (see Figure 12 OpenAIRE - Data providers). OpenAIRE provides plain keyword search functionality, enhanced with filter capabilities based on the type of data provider and/or its compatibility level. When none of the filters is used and when the keyword search box is empty, OpenAIRE allows access to the full catalogue of data providers. For each entry in the catalogue, the following information is provided: its name, its type, the country of operations, the institution and the compatibility level. Furthermore, as the results are shown in a table format it is possible to rearrange the results give the previous mentioned information. Public Page 25 of

27 Figure 12 OpenAIRE - Data providers Finally, OpenAIRE s third service is a generic one that allows access to satellite elements of information, such as newsletters, articles and knowledge bases to name a few (see Figure 13 OpenAIRE - General search). OpenAIRE provides keyword search functionality along with filtering and ordering functionality. Public Page 26 of

28 Figure 13 OpenAIRE - General search Technologies used The interface of OpenAIRE is built using PHP and JavaScript technologies. For the design of the web pages, the front-end CSS framework Bootstrap is used. Moreover, in order for the OpenAIRE to provide the extra aggregated information per resource, such as the years of publication chart in the research project, the following packages are used: the Piwik framework for performing the data analytics calculations in real time and the Highcharts framework for the visualization Conclusions OpenAIRE provides the user with an intuitive interface for exploring publications and research data from multiple data providers. It relieves him or her from the burden of searching multiple repositories, as OpenAIRE serves as a central access point for searching over multiple repositories. Finally, as aggregated content (publication, research data, etc.) is enriched with relevant information via linkage, finding similar information as well as gaining a general view of how research is progressing is achieved easily. Public Page 27 of

29 3.4 GALAXY Galaxy 5 is a web-based platform for intensive data analysis from the life science domain. The platform has been designed to achieve a twofold purpose: (a) to increase access to complex computational analysis to all scientists, including those with limited or no programming knowledge and/or systems administration expertise, and (b) to perform accessible, reproducible and transparent data analysis. These features will further promote accountability and collaboration within the scientific community. The Galaxy platform is available as a free web service, known as the Public Galaxy Server. The Galaxy Server offers a plethora of services: (a) integrated tools for analysis, (b) extensive tutorial demonstrations, (c) terabyte of reference data, (d) permanent storage, (e) resources for complex computational tasks, (f) persistent workspaces and (g) publication services. As the Public Galaxy Server is a centralized solution, it is thus limited in covering the different analysis needs of the entire community. To overcome this limitation the Galaxy Software Framework is also available for local installation, offering extensive customization to the users needs, and as a cloud version supporting the most common cloud services, such as the amazon web service or Google cloud platform Features Functionality The Galaxy framework offers three primary features: an analysis workspace, a workflow editor and a communication platform. The analysis workspace offers a uniform environment to browse for tools and execute them. As the Galaxy platform is an open source software it allows developers to integrate their own tools by just writing a configuration file that describes how to run the tool, including detailed specification of input and output parameters. This specification allows the Galaxy framework to work with the tool abstractly. When a user performs an analysis using Galaxy, the framework keeps track on all the details of the execution automatically and transparently. More specifically, it automatically generates metadata for each step of the analysis, such as descriptive information about the input and output datasets and their invocations (e.g. the number of sequences in a dataset or a version of genomic assembly), the tools used, the parameter values selected, in order to be able to repeat the exact same analysis (reproducibility). Furthermore, the Galaxy platform offers the user the possibility to add annotations, i.e. descriptions or notes, in the analysis. These annotations are a critical step of reproducibility of the experiments, as it enables the user to explain why a particular step is required, i.e. capturing the intuition and/or the intention of the analysis. Another main functionality of the Galaxy platform is the workflow editor, which allows the users to construct complex multi-step analysis. The workflow editor provides a simple and graphical interface where the users connect tools to construct complex workflows. The workflow editor verifies for each link between tools that the tools are compatible. Finally, the last main feature of the Galaxy framework is the communication platform, which allows the sharing and publication of documents along the analysis and promotes transparency. Galaxy offers three different communication services: the first one is a sharing model for the Galaxy 5 Public Page 28 of

items - datasets, histories, and workflows - and the public repositories of the already published items, the second one is a web-based framework for displaying shared or published Galaxy items; and

30 items - datasets, histories, and workflows - and the public repositories of the already published items, the second one is a web-based framework for displaying shared or published Galaxy items; and the last one is called Pages, which are custom web-based documents that enable users to communicate their experiment at every level of detail and in such a way that readers can view, reproduce, and extend the experiment without leaving Galaxy or the web browser. Figure 14 GALAXY Analysis Workspace User Interface The Galaxy offers a plain and simple interface for accessing the full functionality of the platform. The Galaxy analysis workspace (see Figure 14 GALAXY Analysis Workspace) is where complex analyses take place. The workspace has four distinct areas: on the top of the page is the navigation bar, on the left column is the tool panel, in the middle is the detail panel and on the right column is the history panel. The navigation bar offers access to the other major Galaxy services, including the workflow editor and the communication platform. The tool panel lists all the available tools for analysis and data sources that a user can use. The detail panel displays the interactive interfaces for the tools selected by the users. Finally, the history panel shows the input data and output results of each step of the analysis performed by the users, as well as any automated generated metadata and user-generated annotations. Every action by the user generates a new history item, which can then be used in subsequent analyses, downloaded, or visualized. Galaxy's history panel helps to facilitate reproducibility by showing provenance of data and by enabling users to extract a workflow from a Public Page 29 of

history, rerun analysis steps, visualize output datasets, tag datasets for searching and grouping, and annotate steps with information about their purpose or importance.

31 history, rerun analysis steps, visualize output datasets, tag datasets for searching and grouping, and annotate steps with information about their purpose or importance. Figure 15 GALAXY Workflow Editor The second main functionality of Galaxy is the workflow editor (see Figure 15 GALAXY Workflow Editor) for creating or modifying complex workflows. The editor has four distinct areas: on the top of the page is the navigation bar, on the left column is the tool panel, in the middle is the editor panel and on the right column is the details panel. A user adds a tool from the tool panel to the editor panel and configures each step of the workflow using the details panel for setting parameters or getting tips. The details panel also allows the user to annotate the workflow and each workflow steps. Workflows run in Galaxy's analysis workspace; like all tools executed in Galaxy, Galaxy automatically generates history items and provenance information for each tool executed via a workflow. Public Page 30 of

32 Figure 16 GALAXY Public Repository The last main functionality is the community platform offered by Galaxy. Galaxy offers a sharing model with public repositories for sharing and publishing data, histories, workflows, visualizations and documentations (also known as Pages). Publishing an item via Galaxy generates a link to the item and lists it in Galaxy's public repository. The repositories can be searched, sorted and filtered by name, annotation, owner, and community tags (see Figure 16 GALAXY Public Repository). Each shared or published item is displayed in a webpage with its automatics generated metadata, such as the execution details, as well as the user s annotations and additional links (see Figure 17 GALAXY Published Workflow). The item's webpage provides a link to the element of description so that anyone viewing the item can import it into their analysis workspace and start using it. The page also highlights information about the item and additional links: its author, links to related items, the item's community tags (the most popular tags that users have applied to the item), and the user's item tags. Tags link back to the public repository and show items that share the same tag. Public Page 31 of

33 Figure 17 GALAXY Published Workflow Technologies Used The Galaxy platform is implemented primarily in the Python programming language (version 2.7). The Framework is distributed as a standalone package that includes an embedded web server and SQL database, but can be configured to use different external web server or database. Finally, as the Galaxy platform is designed to perform complex analysis in order to use clusters for running the tasks, the platform incorporate as job scheduling the frameworks of Portable Batch System (PBS) or Oracle Grid Engine (OGE) Conclusions Galaxy offers a web-based platform designed it for the execution and reproducibility of genomic analysis. The platform offers a workflow editor for composing complex workflow service and a workflow execution system that is completely available through a web browser, requiring neither software installation nor local processing power usage. Its interface is plain and intuitive, designed specifically for a non-technical audience. Public Page 32 of

3.5 ARGO Argo 6 is a multi-user web-based workbench for collaborative development and evaluation of textprocessing workflows for automatic annotations.

34 3.5 ARGO Argo 6 is a multi-user web-based workbench for collaborative development and evaluation of textprocessing workflows for automatic annotations. Moreover, Argo supports manual annotations from users using the incorporated annotation editor. Argo includes a growing library of elementary processing components or analytics, developed mostly at the National Centre for Text Mining (NaCTeM), which range from simple data (de)serialization and natural language processing (NLP) tasks to semantic annotation ones (named entity and relationship recognition) Features Functionality The principal features of Argo include the easy combination of elementary text-processing components to form meaningful and comprehensive processing workflows and the ability to manually intervene in the otherwise automatic process of annotation by correcting or creating new annotations. The system comes with a predefined set of processing components of the UIMA (Unstructured Information Management Architecture) specification and workflows for various tasks, from sentence splitting and tokenization, to named-entity recognition, to database storage. Argo also allows the users to deposit their own components as long as they are UIMA compliant, as UIMA is an OASIS standard 7 that ensures interoperability of individual processing components by defining common data structures and interfaces. Another key feature is the incorporated annotation editor. The annotation editor allows for adding new span-of-text annotations, removing or modifying previously identified annotations and adding metadata. The annotated spans of text can embed or even overlap with other spans. In both cases the editor marks and displays the annotations in a lucid and visually unambiguous manner. Finally, due to the distributed nature of the system, Argo supports user collaboration, i.e. users can share their workflows, data and results with other users of the framework. Figure 18 ARGO Homepage Public Page 33 of

35 Figure 19 ARGO - Workflow Editor User Interface Argo s main page offers access to the view of pre-existing workflows, the workflow editor, the process view and the view of raw and processed documents (see Figure 18 ARGO Homepage). The first main functionality of Argo is the ability to create new workflows via a workflow editor, which is available by selecting the Create button in the toolbar of the main page. The workflow editor (see Figure 19 ARGO - Workflow Editor) lists all the available components that the user can choose from in the lefthand-side panel, while the main middle panel is a canvas to compose the workflow. To facilitate the search and the use of a component in the workflow formation, Argo categorizes them into Readers, Analytics, and Consumers and when clicking on a component it offers the details of this component in the right-hand-side panel. Argo s workflow editor allows the user to draw a block diagram representing a workflow by selecting among the available components from Argo s library and connecting them in a pipeline by dragging a connector from the outgoing port of one component onto another component (see Figure 20 ARGO Complete Workflow). Some of the components of a workflow may require to set parameter values. This can easily be achieved by selecting the spanner button on the right upper corner of the component where a pop-up window appears and offers access and information for Public Page 34 of

36 setting all the necessary parameters (see Figure 21 ARGO Setting Component Parameter). To complete the workflow formation process the user can also add a name and a description of the workflow for reference in the workflow panel. Note that during the workflow formation process if the workflow editor finds an error, such as missing a parameter value, it informs the user with messages in the bottom area of the screen. Figure 20 ARGO Complete Workflow Public Page 35 of

37 Figure 21 ARGO Setting Component Parameter The second main functionality of Argo is the ability to run existing workflows. For the main page of Argo, the user can view all available workflow and select the one to execute in Argo s facilities by selecting the Run button in the toolbar (see Figure 22 ARGO Selecting Workflow). Argo switches automatically to the process view where the user can track the process of his or her workflow s execution (see Figure 23 ARGO - Processing a Workflow). For workflows that include user-interactive components, progress will pause at some point and the right-hand side panel will display appropriate prompts, such as buttons or hyperlinks that the user needs to click/follow before the execution can progress. A completed executions will be indicated by a status of Finished, after which the user can check for the results by switching to the documents view and navigating to the folder where the set as the output folder. The user will be able to perform several actions on one or more documents, such as downloading the document and investigate the results (see Figure 24 ARGO - Results). Note that the folder contains both the original raw texts as well as the annotated ones. Public Page 36 of

38 Figure 22 ARGO Selecting Workflow Figure 23 ARGO - Processing a Workflow Figure 24 ARGO - Results Public Page 37 of

39 3.5.3 Technologies Used The user interface of Argo has been built using the Google Web Toolkit (GWT), a development toolkit for building web-based applications, and Scalable Vector Graphics (SVG), an open standard for describing vector graphics, which is widely supported by web browsers and is heavily utilized in the annotation editor. Due to the distributed nature of the system the client server communication is accomplished through the well-established web service protocols SOAP and REST. The inclusion of web-service protocols provides the opportunity to the software developers who wish to build their own systems to connect directly to the Argo server. Furthermore, any custom-built clients will be able to immediately take advantage of any changes made by workflow designers, which abstracts away the inner workings of the custom-built client from the workflow and its future modifications, thus accelerating the collaborative development cycle significantly Conclusions Argo offers a workflow execution system that is completely available through a web browser, requiring neither software installation nor local processing power usage. Argo has a simple interface for composing linear workflow services and offers also user-interactive processing services, such as the annotation editor, to facilitate the workflow execution. Its interface is plain and intuitive, designed specifically for a non-technical audience. 3.6 WebAnno WebAnno 8 is a multi-user, generic purpose, web-based system for distributed annotations of text. It supports a wide range of annotations both of linguistic orientation - with various predefined layers of morphological, syntactical, and semantic annotations - and of non-linguistic orientation via the definition of custom annotation layers. WebAnno supports multiple roles, such as annotator, curator and project manager, in order to support the full cycle of life of an annotation/curation project and to facilitate the parallelization of multi-user, annotating process. The WebAnno platform is available as a downloadable service, as a standalone server version, and as a web service via the CLARIN-D Center at the University of Tubingen as part of the CLARIN-D infrastructure Features Functionality The WebAnno framework offers functionalities to support the full life cycle of an annotation project, from the creation of a project and the definition of its settings/parameters, to the annotation itself and/or to its curation. The first functionality that it offers is the project management and monitoring service, a service that is only available to users with the roles of project manager, project creator and/or administrator. This service allows the user to create a new project or to manager/edit and monitoring existing ones. The creation of the project requires defining the type of the project, whether it is a project for manual or automated annotation, curation or correction, and the respective settings. 8 Public Page 38 of

40 For example, when creating a project for manual annotation, the user must define the documents for annotation and the details about the annotation, such as to assign the annotation layers and the tagset to be used or to define custom ones. He or she can also describe guidelines for the annotation itself and assign the annotation process to other users that will work in parallel. Apart from creating a project, WebAnno also offers the ability to observe the progress and the document status of the projects. The second functionality WebAnno offers is the annotation service, which is available to user with roles of annotator, project manager and administrator. The user can select a document for annotation from the available annotation projects and proceed annotating according to the project s settings and guidelines. Another functionality is the curation service, which is only available to users with the roles of project manager, curator and/or administrator. The curation service automatically curates the annotations from multiple users when they are in accordance between them and the user needs only to assist the process when there is a conflict. Moreover, the user has also the ability to add extra annotations or delete existing ones. Another functionality that WebAnno offers is the correction service, which is only available to user with the roles of annotator, project manager and/or administrator. The correction service allows the user to transfer available annotation suggestions to the corrected document, as well as to add extra annotations that didn t exist in the suggestions. The service is mainly available for the correction of annotated documents from an automatic annotation process. The final functionality that WebAnno offers is the automatic annotation service, which is only available to user with the roles of annotator, project manager and/or administrator. This functionality gives the possibility to annotate documents using an already trained model User Interface The homepage of WebAnno (see Figure 25 WebAnno - Homepage) offers access to all the available services, which are in order of appearance Annotation for manually annotating documents, Curation for manually curating already annotated projects, Correction for correcting automatically annotated projects, Automation for automatically annotating document, Projects for creating and managing existing projects, Monitoring for monitoring the progress of a project and Manage user for managing the registered users. Depending on the roles a user has, sets of different services are available for use. Public Page 39 of

41 Figure 25 WebAnno - Homepage One of the most important functionalities of WebAnno is its Annotation service (see Figure 26 WebAnno - Annotation) for manually annotating the documents of an annotation project. The annotation page has two distinct areas: on the top of the page is the toolbar, while the rest of the page is the annotation area. The toolbar offers to the user access to his/her assigned annotation projects and their documents, navigation capabilities to navigate the document under process, as well as access to the guidelines of the project. As far as the annotation process itself the user can select the area of interest and a pop-up window appears with the available annotations. There the user can select the appropriate level of annotation and the corresponding annotations. Note that the annotations the user added in previous steps appear with the document. Finally, when the user considers the annotation of the document completed, he or she can declare it by marking the document as Done (see the right side of the toolbar), and move to the next document/project for annotation. Public Page 40 of

Figure 26 WebAnno - Annotation The other important functionality of WebAnno is its Curation service (see Figure 27 WebAnno - Curation) for curating already annotated documents from multiple users.

42 Figure 26 WebAnno - Annotation The other important functionality of WebAnno is its Curation service (see Figure 27 WebAnno - Curation) for curating already annotated documents from multiple users. The curation page has two distinct areas: on the top of the page is the toolbar, while the rest of the page is the curation area. The toolbar offers to the user access to his/her assigned curation projects and their documents, navigation capabilities to navigate the document under process, as well as access to the guidelines of the project. As far as the curation process itself, on the left side of the curation area, named Sentences, the sentences of the chosen document are displayed. The ones shaded in red have a conflict of annotation. If the user wants to see the annotation in the sentence, he or she only needs to click on the sentence. Then on the right side of the Sentences panel the annotation panel is displayed. In more details, the frame called Annotation shows the result of the curation process so far. Below it the annotated sentences are shown in separate frames, titled with the names of the annotators. By clicking on an annotation in one of the annotator s frames, the user accepts the annotation and the service merges it into the Annotation view. The sentence in the Annotation frame can be treated like a sentence in the Public Page 41 of

43 Annotation Service. This means that by selecting a word with a click the curator user is able to produce new annotations, while by clicking on an annotation the user is able to change its classification or delete it. Note that the different states of the annotation agreement are marked by different colors. If the annotations are in agreement, they are marked grey in the lower frames and light green in the Annotation frame. If the annotations are disparate, the markings are dark blue in the lower frames. By default, only annotations in agreement are transferred automatically into the curated file. If the user chooses one annotation to be right by clicking on it, the chosen annotation will turn green in the frame of the corresponding annotator. The annotations that were not chosen to be in the curation file are marked dark blue. The annotations that were wrongly classified are marked in red. Finally, when the user considers the curation of the document completed, he or she can declare it by marking the document as Done (see the right side of the toolbar), and move to the next document/project for curation. Figure 27 WebAnno - Curation Public Page 42 of

44 3.6.3 Technologies Used The WebAnno platform is implemented primarily in the Java programming language (version 8) and its full functionality is supported only for the browsers Chrome and Safari. The Framework is distributed as a standalone package, which includes a web server and SQL database, and as a server package, which the administrator must install Apache Tomcat and MySQL server Conclusions WebAnno offers a web-based platform designed for annotating and curating documents. The platform offers a variety of services, such as manual or automated annotation, curation, correction and project management and monitoring, to support the full cycle of life of the annotation process. Finally, its interface is plain and intuitive, designed specifically for a non-technical audience. Public Page 43 of

4. Overview of the data model Before describing the functionality of the OpenMinTeD user interface, an overview of the OMTD-Share data model is presented.

45 4. Overview of the data model Before describing the functionality of the OpenMinTeD user interface, an overview of the OMTD-Share data model is presented. The first version of the model has been released in M14 in "D5.2 Interoperability Standards and Specifications Report" and an overview of the model is presented in Figure 28 OpenMinTeD Schema overview. Concepts like Corpus, Lexical/Conceptual Resource, Component and Model are core entities of the OMTD-Share model (shown in blue in Figure 14), while concepts like Organization, Person and Project are satellite entities (shown in orange). The core entities are the central concepts of the ontology, which are described in detail and whose focus is to facilitate their research and retrieval processes. Apart from the core entities, the OMTD-Share model also describes other satellite entities, which are linked to core entities through relations that in the model are represented as basic elements (relations are shown in black arrows in Figure 14). The interconnection between the core entities and these satellite entities, pictures in depth the full lifecycle of resources from production to use: reference documents related to the resource (papers, reports, manuals etc.), persons/organizations involved in its creation and use (creators, distributors etc.), related projects and activities (funding projects, activities of usage etc.), accompanying licenses, etc. It should be noted, however, that the satellite entities are described only when the case arises, i.e. when they are linked to a specific resource. Figure 28 OpenMinTeD Schema overview Public Page 44 of

46 Figure 29 OpenMinTeD Corpus Schema offers a partial view of the model related to the Corpus concept. The metadata information of a Corpus contains detailed information related to its unique identification, such as its names, description and identifiers which are available in the element of ms:identificationinfo. Moreover, provenance information related to its creation, such as who is the creator, whether it is part of a funding project as well as its creation date, is available in the element ms:resourcecreationinfo. Finally, the status of a corpus, whether it is a raw or an annotated one, along with all the necessary information, such as the annotations, the related raw corpus, etc., is available in the element ms:corpussubtypespecificinfo. Public Page 45 of

47 Figure 29 OpenMinTeD Corpus Schema Public Page 46 of

48 5. Actors In the context of the OpenMinTeD project the following Actor roles have been identified in D4.2 - Community Requirements Analysis Report and D4.3 - OpenMinTeD Functional Specifications : END USER: Researchers and text mining experts interested in exploring the publication content offered via of the OMTD using services and tools. They can access the OMTD services through the OMTD web interface. ADMINISTRATOR: A user who has full access to the OMTD system. She/he is responsible for monitoring and configuring the system, as well as the management of the users, services and the registry. 5.1 End Users End users can be categorized using two different criteria. The first categorization is based on whether users are registered or not in the OMTD platform. The second categorization concerns only the registered users and it is based on their level of expertise Anonymous vs. Registered Users Anonymous users are researchers and text-mining scientists interested in viewing and exploring the metadata content of OMTD registry. For further access to OMTD services, e.g., Workflow Execution Engine or Annotation Editor, anonymous users must register to OMTD to get valid credentials Application domain vs. Text-Mining Users Application domain users are researchers from different scientific communities, who are generally familiar with using (installing, configuring, running) software applications, but whose technical skills on text and data mining processes are minimal. Moreover, they are interested in using end-to-end services and applications. Text-mining users, on the other hand, are developers of specific components. They are interested in adding their components in the OMTD registry, evaluating textmining components/workflows or interacting with the Workflow Editor in order to design new textmining applications. 5.2 Administrators and Moderators For the time being, no specific functionalities for the advanced administration have been defined. Public Page 47 of

49 6. Functionalities In this section, we present the main functionalities that have been analyzed and will be implemented in the first OMTD platform release. This section will be updated throughout the whole project, as iterative cycles of design, implementation and evaluation will take place, incorporating gradually the complexity of the analyzed use cases and the underlying data model. 6.1 Authenticating with OpenMinTeD AAI While most of the OpenMinTeD platform s functionality is open to all users (e.g. searching for resources), there are many cases where the users must be registered and authenticated. These cases fall into two categories: modifying the content of OpenMinTeD (e.g. by registering a new resource, editing an existing resource) or using computational resources to execute a workflow or build a new corpus. OpenMinTeD decided not to implement authentication mechanisms and services in the platform but instead rely on trusted third party identity providers to authenticate the users. The trusted third party identity providers belong in two broad categories: academic institutions, accessible through the edugain platform and (for users not belonging to academia) social networks, like ORCID, LinkedIn, Google, or Facebook. Of course, the identity of users from social networks is less trusted than this of those coming from academic institutions, so their initial privileges (e.g. the quota on hardware resources to execute workflows) are limited but can be increased upon a successful request. The use of only third party providers for authentication implies that users cannot directly register to the platform. This doesn t mean that OpenMinTeD doesn t maintain a database of users with their names and other information. Instead, users are indirectly registered to OpenMinTeD when they first authenticate themselves using an identity provider. At that point, with the user s consent, OpenMinTeD is creating a new user account, that is used throughout the platform. The user information originates from the identity provider but the user has the right to either edit or complete missing information. For a detailed description of the process, see Table 1 User authentication functionality. Table 1 User authentication functionality ID Description Required Databases Actors Preconditions Main process User authentication End users are signing in (authenticating) to the OpenMinTeD platform. AAI user registry End user None 1. User presses the sign in menu 2. User selects the desired authentication method: edugain or social Public Page 48 of

50 networks 3. The user is redirected to the identity provider (e.g. university login page or Facebook login page) where authenticates using his local credentials. Upon successful authentication, the user is redirected back to the OpenMinTeD platform. 4. (Optional) if the user is authenticated for the first time for OpenMinTeD, his account details are presented to him for verification. The user is then asked to complete the missing parts of the account information and consent to the creation of an OpenMinTeD account. 5. Finally, the user is signed in the platform. Exceptions - Related data model entities Processed Data - Generated Data User account information Figure 30 Prototype for Login Page Public Page 49 of

51 Figure 31 Prototype for selecting institute Figure 32 Prototype for Attribute Release Consent Form Public Page 50 of

52 6.2 Browse and search resources in the OMTD registry The OMTD portal allows end users to browse and explore the OMTD registry for resources of different types, from corpora and language descriptions to processing components and services Browse resources The OMTD portal offers access to all available resources, accompanied by basic metadata. Entry points are either the Search menu, where the user begins browsing the resources based on their type (Applications, Corpora, or TDM Components) or an empty search from the home page. Afterwards, the user can further refine the list of displayed resources by using the filters on the left side of the page. Details about browsing are illustrated in Table 2 Browse resources functionality. Table 2 Browse resources functionality ID Description Required Databases Actors Preconditions Browse resources End users view a list of components/applications/corpora along with a minimum set of metadata. OMTD registry End user None Main process 6. End users press Enter in the search box with empty keyword or select the options Applications, Corpora, or Components from the Search in the top menu in the OMTD portal homepage. 7. View a list of components/applications/corpora along with basic information. 8. (Optional) End users click on a resource to see all its available metadata. Exceptions - Related data model entities Processed Data Generated Data - Component, Corpus Metadata information about the OMTD resources. Public Page 51 of

53 Figure 33 Prototype for browse resources Public Page 52 of

54 Figure 34 Prototype for browse resources - filtered by resource type = application Public Page 53 of

55 6.2.2 Search for resources To further facilitate the browsing and exploration functionality, the OMTD platform offers a faceted search. It is based on simple keywords matched over a subset of the metadata elements used to describe the various resources. End users can select among the provided fields and filter the results accordingly. The faceted search can be combined with keyword search allowing for an in-depth refinement. Details about the search functionality are presented in Table 3 Search for resources. Table 3 Search for resources ID Description Required Databases Actors Preconditions Search for resources The End user queries the OMTD portal for resources using keywords and any combination of the filter fields, such as resource type, distribution license, language and mime type. OMTD registry End user None Main process Exceptions - 1. The End user enters a set of keywords in the search box and/or selects a combination of the filter fields. 2. The OMTD portal processes the query and returns a list of resources that fit the criterions. 3. The End user views a list a list of resources that fit the criterions. 4. (Optional) The End user clicks on a resource to see all its available metadata. Related data model entities Processed Data Corpus, Component, Lexical/Conceptual Resource, Language Description Metadata information about the OMTD resources. Generated Data - Public Page 54 of

Figure 35 Prototype for search - searching for an extractor in the OpenMinTeD applications 6.2.

56 Figure 35 Prototype for search - searching for an extractor in the OpenMinTeD applications Resource landing page Both the browsing and search functionalities offer a high-level overview of the available resources, based on the selected criteria along with basic information about each one of them, such as the language resource s name, its type, a brief description, along with statistics such as the number of downloads and views per resource. The resource s landing page is identified by a unique URL based on the resource s id and presents the more detailed metadata information. Apart from the basic information about the resource, i.e., name Public Page 55 of

57 and description, additional metadata are presented (e.g., creation, contact and version information, distribution info, documentation, evaluation). A landing page view depends on the type of resource, where different information is presented based on the data model. Table 4 Resource s landing page functionality presents initial aspects of a generic resource s landing page. The UI must take into consideration the following: - Page real estate how to divide the page, what visual effects to bring out the most important elements. Special attention must be given to mobile devices (responsive design). - Semantic elements grouped together what is the best way to achieve this, i.e., what types of UI components (e.g., tabs, sliders, popups) to use for easier use. Table 4 Resource s landing page functionality ID Description Required Databases Actors Preconditions Main process Exceptions - Related data model entities Processed Data Generated Data - Resource landing page The End user selects a resource to view all its available metadata. OMTD registry End user None 1. Given a list with resources, the End user clicks on a resource s name to see all its available metadata. 2. The OMTD portal processes the request and returns the full metadata description for the selected resource. Corpus, Component, Lexical/Conceptual Resource, Language Description Metadata information about the OMTD language resources. Public Page 56 of

58 Figure 36 Prototype for a component s landing page Public Page 57 of

59 Figure 37 Prototype for an application's landing page Public Page 58 of

Figure 38 Prototype for a corpus' landing page 6.3 Resource registration The OMTD platform enriches its content by allowing the registration of new resources.

60 Figure 38 Prototype for a corpus' landing page 6.3 Resource registration The OMTD platform enriches its content by allowing the registration of new resources. Registration is available through the ADD menu, available only to registered users. The registration methods for all types of resources are two: entering the metadata of the components using a web registration form or providing the metadata directly by uploading an XML file following the OMTD-SHARE schema. In both cases, the metadata are validated by the platform and the user is notified for any errors that may exist. Public Page 59 of

61 6.3.1 Resource registration using a form The OMTD platform offers a registration form so that providers can easily share with the OMTD platform their resources. It facilitates the required metadata description and validates it in real time. Details about registration via registration form are presented in Table 5 Resource registration functionality via registration form. Table 5 Resource registration functionality via registration form ID Description Required Databases Actors Preconditions Main process Resource registration via registration form Registered users register a resource via a form. OMTD registry Registered user Registered user accesses the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, registered users select the ADD menu and the submenu of the resource they want to register. 2. Registered users select the option I want to register using the form!. 3. Registered users complete the registration form with the requested information. 4. Registered users submit the registration by clicking on Register. An error message if all required metadata are not completed. Corpus, Component, Lexical/Conceptual Resource, Language Description Exceptions Related data model entities Processed Data - Generated Data Metadata information about the newly registered resource. Public Page 60 of

62 Figure 39 Prototype for resource registration using a form - component example Public Page 61 of

63 6.3.2 Resource registration via XML description An alternative way to register a resource in the OMTD platform is by uploading an XML description of the resource in the OMTD-schema. Details about the resource registration functionality via XML description are illustrated in Table 6 Resource registration functionality via XML description. Table 6 Resource registration functionality via XML description ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Resource registration via XML description Registered users register a resource into the OMTD registry using an XML description following the OMTD-schema. OMTD registry Registered user Registered users access the OMTD web portal using a common web browser. The XML description follows the OMTD-Schema. 1. In the OMTD portal homepage, Registered users select the ADD menu and the submenu of the resource they want to register. 2. Registered users click on I want to register using an XML option. 3. Registered users upload an XML description: - By copying and pasting it in the text area, - By clicking on the Browse, selecting the appropriate local xml file and clicking on Preview, or - By pasting a URL where the XML description lives and clicking on Preview. 4. Registered users can preview the XML description in the text area. 5. Registered users complete the registration by clicking on Register. An error message if the XML description does not follow the OMTD Schema. Corpus, Component, Lexical/Conceptual Resource, Language Description Processed Data Generated Data Metadata information about the newly registered resource. Public Page 62 of

Figure 40 Prototype for resource registration using an XML - component example 6.3.

64 Figure 40 Prototype for resource registration using an XML - component example Component registration The Components in the OpenMinTeD platform are the building blocks of the user-generated workflows. OpenMinTeD supports three distribution methods of TDM components: Executable code, that may be written in any programming language and framework but must be wrapped in a Docker container following the OpenMinTeD guidelines, Web Services, that are executed outside the OpenMinTeD platform and must offer an API specified in the OpenMinTeD guidelines, and Public Page 63 of

65 Components that belong to three of the main TDM frameworks, UIMA, GATE and ALVIS. OpenMinTeD can natively execute these components and expects their code to be published in a Maven repository, the most popular distribution medium of Java code. As with all types of resources, users can register their components either using a web registration form, or by directly providing their metadata (see previous sections for details). Especially for the third type of components, since OpenMinTeD natively supports these Java based component, a third option is available to the users: instead of manually entering the metadata of the components, the users can provide the maven coordinates of the components UIMA/GATE components The third way to register a component in the OMTD platform is by extracting metadata information available in a published maven artifact. The OMTD platform requires the artifactid, groupid and version of the maven artifact and identifies the components that exist in the artifact. After the user selects the component to be registered, the platform extracts as much metadata as possible from the artifact (either from the pom.xml descriptor of the maven artifact or by extracting an already existing descriptor in the OMTD-SHARE schema) and tries to pre-fill as much information as possible to the registration form. The component provider will only need to fill the information that was not automatically extracted into the registration form. Details about the component registration functionality via maven coordinates are presented in Table 7 Component registration functionality via maven coordinates. Table 7 Component registration functionality via maven coordinates ID Description Required Databases Actors Preconditions Main process Component registration via maven coordinates The Registered user registers a component into the OMTD registry by resolving maven coordinates. OMTD registry, Maven repository Registered user Registered users access the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, Registered users select the menu Add- >Component. 2. Registered users fill in the maven artifact id and group id. 3. Registered users click on Resolve maven coordinates. 4. The OMTD platform extracts from the Maven repository metadata information and pre-fills the registration form accordingly. 5. Registered users complete the rest of the form with the requested information and/or correct the pre-filled information from the maven repository. Public Page 64 of

Exceptions Related data model entities Processed Data - Generated Data 6. Registered users submit the registration by selecting the Register option.

66 Exceptions Related data model entities Processed Data - Generated Data 6. Registered users submit the registration by selecting the Register option. An error message if all required metadata are not completed. Component Metadata information about the newly registered component. Figure 41 Prototype for registering UIMA / GATE components Public Page 65 of

67 Dockerized components Unlike the case of UIMA or GATE components, OpenMinTeD does not offer an automatic way to prefill the metadata for components wrapped in docker images. For this reason, the only way to register a dockerized component is either by filling the metadata in a registration form or to provide an XML file in the OMTD-SHARE schema. For more details, see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description. Figure 42 Prototype for registering dockerized components Web Services For the same reasons as with the dockerized components, the only way to register a dockerized component is either by filling the metadata in a registration form or to provide an XML file in the Public Page 66 of

OMTD-SHARE schema. For more details, see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description.

68 OMTD-SHARE schema. For more details, see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description. Figure 43 Prototype for registering web services Corpus registration The OpenMinTeD platform offers two ways to register a corpus: the users can either upload an already existing corpus and provide its metadata, or they can instruct the platform to build a corpus with publications coming from OpenAIRE and/or CORE Uploading a Corpus Registered users that already have a collection of publications can choose to upload it to the OpenMinTeD platform. In order to do so, they have to create a zip archive with the publications of the corpus and (optionally) their metadata and then upload it to the platform. Along with the zip file, they also must provide the metadata of the corpus by filling them in a web registration form. More details are presented in Table 8 Corpus upload functionality. Public Page 67 of

69 Table 8 Corpus upload functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data - Generated Data Corpus upload Registered users upload and register a corpus into the OMTD registry using the registration form. OMTD registry Registered user Registered users access the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, registered users select the menu Share- >Corpus. 2. Registered users click on the I want to upload my corpus option. 3. Registered users upload a zipped file containing the corpus data. 4. Registered users complete the registration form with the requested information. 5. Registered users submit the registration by clicking on Register. An error message if all required metadata are not completed. Corpus Metadata information about the newly registered corpus. Public Page 68 of

70 Figure 44 Prototype for uploading a corpus Public Page 69 of

71 Figure 45 Prototype for uploading a corpus - upload zip and fill metadata Building a Corpus The Corpus Builder allows registered users to create a new corpus using keyword search functionality to retrieve publications from the OMTD compatible publication providers. The retrieved results can be subsequently refined via filters, e.g., license, publication year, subject, language and content source. Details about the corpus builder functionality are illustrated in Table 9 Corpus builder functionality. Public Page 70 of

72 Table 9 Corpus builder functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data - Generated Data Corpus builder Registered users build a corpus of publications and register it into the OMTD registry using the corpus builder. OMTD registry Registered user Registered users access the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, registered users select the menu ADD- >Corpora. 2. Registered users click on the I want to build a new corpus option. 3. (Optional) Registered users enter a set of keywords into the text search box in the top of the page as criteria for retrieving a set of publications. 4. The OMTD platform process the query, communicates with its linked content providers to retrieve the matching publications and presents a high overview of the returned results per provider. 5. (Optional) Registered users can further refine the search by using a combination of keywords and filter fields, such as license, publication year, subject, language and content source. 6. When registered users are satisfied with the corpus, they submit the registration by clicking on Create corpus. 7. The OMTD platform extracts from the content providers metadata information and pre-fills the registration form. 8. Registered users complete the registration form with the requested information. 9. Registered users submit the registration by clicking on Register. An error message if all required metadata are not completed. Corpus Metadata information about the newly registered corpus. Public Page 71 of

73 Figure 46 Prototype for building a corpus Public Page 72 of

74 Figure 47 Prototype for building a corpus - search for publications Public Page 73 of

75 Figure 48 Prototype for building a corpus - filter publications' selection Public Page 74 of

76 Figure 49 Prototype for building a corpus - edit auto-generated metadata for the corpus Public Page 75 of

77 Figure 50 Prototype for building a corpus - monitor building process Figure 51 Prototype for building a corpus - process ended successfully Public Page 76 of

78 6.3.5 Application registration The OpenMinTeD platform offers two ways to register an application. The first way is mostly intended for users that have an already existing application and want to make it available to the platform. The second way is intended for TDM experts who want to build a workflow using the components that have already been registered to the platform. In both cases, the users must supply the corresponding metadata that describe the application. In order to register an application, the users must select the Add->Application submenu from the main page in the portal Uploading an end-to-end application An end-to-end application can either be a Web Service, which is managed and executed by them, or an executable wrapped in a Docker container. End-to-end applications don t differ much from components, in the sense that they must follow the same guidelines as the components. For this reason, their registration process and options are the same as these of the components and for more details see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description Public Page 77 of

79 Figure 52 Prototype for registering an existing application Building a workflow An alternative way to register an application in the OpenMinTeD platform is to build it via the integrated workflow editor by using the already registered TDM components. After designing the workflow, the user must provide the metadata for the application using a registration form. The workflow editor used in the platform is Galaxy. Details about the application process are provided in Table 10 Build workflow functionality. For more information on how to use the workflow editor, see section 6.5. Table 10 Build workflow functionality ID Description Workflow building Registered users build a workflow and register it into the OMTD registry using the workflow editor. Public Page 78 of

80 Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data - Generated Data OMTD registry Registered user Registered users access the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, registered users select the menu ADD- >Application. 2. Registered users click on the Build a workflow option. 3. The registered users are moved to the workflow editor where they build their workflow using existing components. 4. After saving the workflow, the users are redirected to a registration form where they enter the metadata describing the workflow. 5. Registered users submit the registration by clicking on Register. An error message if all required metadata are not completed. Component Metadata information about the newly registered workflow, the workflow definition in the Galaxy format. Public Page 79 of

81 Figure 53 Prototype for building a new workflow (application) Lexical/Conceptual (Annotation) Resource registration Lexical and conceptual resources are registered in the platform using one of the two common ways for all resources: Users can register them either by filling out a registration form or by providing their metadata in XML format, in the OMTD-SHARE schema. For more details see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description Language Description (Models & Grammars) registration Language description resources (mainly machine learning models and grammars) are registered in the platform using one of the two common ways for all resources: Users can register them either by filling out a registration form or by providing their metadata in XML format, in the OMTD-SHARE schema. For more details see Table 5 Resource registration functionality via registration form and Table 6 Resource registration functionality via XML description. Public Page 80 of

82 6.4 Application execution One of the most important functionalities of the OpenMinTeD platform is the ability to execute TDM applications with a given Corpus as input and an Annotated Corpus as output. Since this is a time-consuming process and the computational resources of the platform are limited, the users are submitting a request for an execution (consisting of an application and an input corpus) and the platform is scheduling the execution, either for immediate execution or waits for resources to become available. The user can monitor the progress of the execution either in the page where they submitted the request or in their user space, under the My operations submenu (for more information, see User Space section). Table 11 Executing an application functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data - Generated Data Application execution Registered users execute an application with a registered Corpus as input. OMTD registry Registered user Registered users access the OMTD web portal using a common web browser. 1. In the OMTD portal homepage, registered users select the menu Process. 2. Registered users click on the Select an input option and are redirected to the list of available corpora where they select a corpus. 3. Registered users click on the Select an application option and are redirected to the list of available applications where they select an application. 4. Registered users click on the Click to run the application! option and the execution is scheduled. 5. The status of the execution is changed to Running and remains like this until the execution is complete or fails. 6. When the execution is complete, the message Application run finished successfully appears, and the user is given the option to download the annotated corpus. An error message if the execution fails for some reason. Component, Corpus Metadata of the annotated corpus, the resulting annotated corpus, metadata about the execution of the workflow. Public Page 81 of

83 Note that the Process menu is not the only option to start the process of executing a workflow. The users can arrive in the same page if they click the process option next to a corpus (or application) when browsing or searching for corpora (or applications). In this case, when they arrive in the Process page, the corpus (or application) is already preselected and they only need to select an application (or corpus). Figure 54 Prototype for executing an application Public Page 82 of

84 Figure 55 Prototype for executing an application - browse for input corpus Public Page 83 of

85 Figure 56 Prototype for executing an application - input corpus selected Public Page 84 of

86 Figure 57 Prototype for executing an application - browse for application Public Page 85 of

87 Figure 58 Prototype for executing an application - application selected Figure 59 Prototype for executing an application - process running Public Page 86 of

88 Figure 60 Prototype for executing an application - process completed 6.5 User Space The User Space provides access to the resources that have been registered by or created for the user by the platform. Through this set of pages the users are able to inspect, modify and delete their resources. By hovering over their username after they have successfully logged in the platform the users are presented with options: My Corpora: This page displays the list of corpora that either registered by the user, or corpora that have been created by the execution of a workflow. My Applications: This page displays the list of applications that have been registered by the user. My Components: contains a list of components that have been registered by the user. My Operations: displays the list of operations of the user, i.e. information about every execution of a workflow that has been requested by the user. The user can see the status of the execution (e.g. pending, running, finished etc.), the application that is being executed and the input and output corpora. Public Page 87 of

89 For all four of these resources, the users can modify their metadata or delete them but only if the resource is still private. From the moment a resource becomes public (also an option in this section of the portal), the users lose the ability to modify or delete them. This happens to support reproducibility and persistence of results, something that wouldn t be possible if users could modify their applications or corpora. Details about the user space functionalities can be found in Table 12 Manage user applications functionality, Table 13 Manage user corpora functionality, Table 14 Manage user components functionality and Table 15 Manage user operations functionality. Table 12 Manage user applications functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Manage user applications. The registered user manages the applications that he has registered. OMTD Registry Registered User Registered users access the OMTD web portal using a common web browser. 1. In the user space menu (hover over the username), the user clicks on the My Applications submenu. 2. The user is presented with the list of applications and has the following options: 1. Navigate to the landing page of the application 2. Edit the application s metadata 3. Edit the application workflow 4. Share the application and make it public 5. Delete the application Component Public Page 88 of

Figure 61 Prototype for user's applications page Table 13 Manage user corpora functionality ID Description Required Databases Actors Preconditions Main process Manage user corpora.

90 Figure 61 Prototype for user's applications page Table 13 Manage user corpora functionality ID Description Required Databases Actors Preconditions Main process Manage user corpora. The registered user manages the corpora that he has registered. OMTD Registry Registered User Registered users access the OMTD web portal using a common web browser. 1. In the user space menu (hover over the username), the user clicks on the My Corpora submenu. 2. The user is presented with the list of corpora and has the following options: 1. Navigate to the landing page of the corpus Public Page 89 of

91 Exceptions Related data model entities Processed Data Generated Data Corpus 2. Edit the corpus metadata 3. Share the corpus and make it public 4. Delete the corpus Public Page 90 of

92 Public Page 91 of

93 Figure 62 Prototype for user's corpora page Figure 63 Prototype for user's corpora page - make corpus public functionality Figure 64 Prototype for user's corpora page - corpus made public successfully Public Page 92 of

Figure 65 Prototype for user's corpora - deleting private corpus functionality Table 14 Manage user components functionality ID Description Required Databases Actors Preconditions Main process

94 Figure 65 Prototype for user's corpora - deleting private corpus functionality Table 14 Manage user components functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model Manage user components. The registered user manages the components that he has registered. OMTD Registry Registered User Registered users access the OMTD web portal using a common web browser. 1. In the user space menu (hover over the username), the user clicks on the My Components submenu. 2. The user is presented with the list of components and has the following options: 1. Navigate to the landing page of the component 2. Edit the components metadata 3. Share the component and make it public 4. Delete the component Component Public Page 93 of

95 entities Processed Data Generated Data Figure 66 Prototype for user's components page Table 15 Manage user operations functionality ID Description Required Databases Actors Preconditions Manage user operations. The registered user manages the history of applications that he has executed. OMTD Registry Registered User Registered users access the OMTD web portal using a common web browser. Public Page 94 of

96 Main process Exceptions Related data model entities Processed Data Generated Data 1. In the user space menu (hover over the username), the user clicks on the My Operations submenu. 2. The user is presented with the list of operations and has the following options: 1. Inspect the details of the operation: application used, input corpus, output annotated corpus, date of execution, potential error messages 2. Delete the operation 6.6 Workflow editor The OMTD platform offers to text-mining experts the possibility to create new workflows by combining components from a variety of platforms (e.g. DKPRO, UIMA Gate, etc.) that are provided by the OMTD platform in a seamless way due to the interoperability specifications designed by the OMTD consortium. The following functionalities are related with the OMTD Workflow Editor: creating a new workflow, browsing/searching for components, editing a workflow by adding or deleting components, setting the required parameters of a component within a workflow and saving the workflow. Details for each functionality are illustrated in Table 16 Create Workflow functionality to Table 21 Save workflow functionality respectively. Moreover, high fidelity prototype overviews of these functionalities are shown in Figure 67 High-fidelity prototype of the Workflow Editor when creating a new workflow to Figure 69 High-fidelity prototype of the Workflow Editor when saving a workflow. More specifically in Figure 67 High-fidelity prototype of the Workflow Editor when creating a new workflow, in the left panel Tools the user can browse or search for components to insert into the workflow, which is formed in the center panel Workflow Canvas. Moreover, the details of the workflow, such as its name, are shown in the right panel under the name Details. In Figure 68 Highfidelity prototype of the Workflow Editor when editing a workflow in the central panel Workflow Canvas, the user edits a workflow by adding or removing components, while in the right panel Details she can set the details of a selected component. Finally in Figure 69 High-fidelity prototype of the Workflow Editor when saving a workflow, the user can save the workflow by selecting Save from the drop down menu of the gear icon from the central panel Workflow canvas. Public Page 95 of

97 Table 16 Create Workflow functionality ID Description Required Databases Actors Preconditions Create Workflow The text-mining user creates a workflow. OMTD registry Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. Main process 1. From the OMTD platform, the Text-Mining User selects Actions -> Workflow Editor to access the Workflow Editor. 2. The OMTD Platform processes the request, and asks from the user to provide a human-readable workflow name 3. The OMTD Platform creates a new workflow resource with the provided workflow name in the OMTD Registry and redirects him/her to the Workflow Editor. Exceptions Related data model entities Processed Data Generated Data Component Metadata of the newly created workflow. Table 17 Browse for components in workflow editor functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Browse for components in workflow editor The text-mining user browses for components within the workflow editor. OMTD registry Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. In the Workflow Editor environment in the left panel Tools, the text-mining user views a list of components along with small description. Component Public Page 96 of

98 Table 18 Search for components in workflow editor functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Search for components in workflow editor The text-mining user searches for components within the workflow editor. OMTD registry Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. 1. In the Workflow Editor environment, the text-mining user enters a set of keywords in the search box at the top left panel named Tools. 2. The OMTD Workflow Editor processes the query and returns a list of resources that fit the criterions. 3. The text-mining user views the resulting list of resources. Component Public Page 97 of

Figure 67 High-fidelity prototype of the Workflow Editor when creating a new workflow Table 19 Edit workflow functionality ID Description Required Databases Actors Preconditions Main process Edit a

99 Figure 67 High-fidelity prototype of the Workflow Editor when creating a new workflow Table 19 Edit workflow functionality ID Description Required Databases Actors Preconditions Main process Edit a workflow The text-mining user edits a workflow by adding, deleting and rearranging components within the workflow. Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. 3. In the Workflow Editor environments, the central panel Workflow Canvas shows the composition of the workflow. 4. For adding a new component in the workflow: 5. In the left panel, the text-mining user views a list of Public Page 98 of

100 components. 6. She can selects the appropriate one and drag it into the central panel Workflow Canvas. 7. (Optional) She connects the newly inserted component to other components in the workflow canvas via the noodle functionality that represents the flow of data from one step in a workflow analysis to the next one, by dragging a noodle from the component output connection to the input connection of other components. 3. For deleting a component in the workflow: in the central panel Workflow Canvas a text-mining user can delete a component by clicking on the x button on the top right corner of the component. 4. For rearranging the component order: in the central panel Workflow Canvas a text-mining user can rearrange a component by adding new connections between components as described in Step 2c or by deleting an existing connection by hovering the pointer over the noodle of interest until an X appears. By clicking on it, the connection is deleted. Exceptions Related data model entities Processed Data Generated Data Component Table 20 Set parameters of a component in workflow functionality ID Description Required Databases Actors Preconditions Main process Set parameters of a component in a workflow The text-mining user sets the parameters of a component in a workflow Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. 1. In the Workflow Editor environments, the central panel Workflow Canvas shows the composition of the workflow. 2. The text-mining user selects the component that wants to configure by clicking on it on the panel Workflow Canvas. 3. In the right panel Details, the detailed description of the component appears. Public Page 99 of

101 Exceptions Related data model entities Processed Data Generated Data 4. The text-mining user can configure the parameters of the component by setting the appropriate values for each of them. Component Figure 68 High-fidelity prototype of the Workflow Editor when editing a workflow Table 21 Save workflow functionality ID Save a workflow Public Page 100 of

102 Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data The text-mining user saves a workflow OMTD Registry Registered Text-Mining User Registered users access the OMTD web portal using a common web browser. 1. In the Workflow Editor environments, the central panel Workflow Canvas shows the composition of the workflow. 2. The text-mining user selects gear icon on the right side of the center Workflow canvas panel and the selects Save within the pull down menu. 3. The OMTD Workflow Editor processes the request and saves the workflow back to the registry. Component Metadata of the workflow Public Page 101 of

103 Figure 69 High-fidelity prototype of the Workflow Editor when saving a workflow 6.7 Annotation editor The OMTD Annotation Editor is one of the added-value services offered within the OMTD infrastructure to further assist the researchers in the text-mining process. Within the OMTD Annotation Editor a wide range of functionalities are offered that support the full cycle of the annotation process of a corpus, from annotation to correction and curation, using a crowd-sourcing approach Annotation Editor s Projects The communication between the OMTD platform and the OMTD Annotation Editor is achieved by the annotation editor s projects. The following functionalities are related with the projects of the OMTD Annotation editor: creating an annotation project from the OMTD platform, editing annotation project s parameters, deleting an annotation project and saving the resulting annotated corpus back to registry. Details for each functionality are illustrated in Table 22 Create Annotation Project functionality to Table 25 Save a curated corpus to OMTD Registry functionality respectively. Moreover, high fidelity prototype overviews of the Edit Annotation Project s Details and Delete Annotation Project Public Page 102 of

104 functionalities are shown in Figure 70 High-fidelity prototype of Editing Annotation Project s details and Deleting functionalities. More specifically, in the right tabbed panels are the details of an annotation project, such as the annotation levels or the users who will work on the annotation project. These details can be edited and set appropriately by the annotation project owner. At the right bottom corner in Figure 70 High-fidelity prototype of Editing Annotation Project s details and Deleting functionalities, the delete button deletes the annotation project from the Annotation Editor s database. Table 22 Create Annotation Project functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Create Annotation Project The End user creates an annotation project from a corpus OMTD Registry and Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser. 1. Given the landing page of a corpus, the Registered user clicks on Process with Annotation Editor button. 2. The OMTD Platform processes the request, creates a new annotation project in the OMTD Annotation Editor and redirects him/her to the Annotation Editors environment in the Projects menu. 3. The Registered user edits the projects details, such as the project type, i.e. annotation or correction, the annotation layers, etc., and saves the project in the Annotation Editor s database. Corpus Metadata and data information of the selected corpus An Annotation Editor s project, along with the raw documents and their annotations (if exist), is generated in the Annotation Editor s database Table 23 Edit Annotation Editor s Project details functionality ID Description Required Databases Actors Edit Annotation Project s details The Registered user edits the annotation editor s project details Annotation Editor s database Registered User Public Page 103 of

105 Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Registered users access the OMTD web portal using a common web browser. 1. In the Annotation Editor environment, a user selects from menu Projects 2. The Annotation Editor processes the request, shows a list with the user s available projects. 3. The Registered User can select a project by clicking its name, and all the projects details are shown in a tabbed format. 4. The Registered User edits the project s details, such as the annotation layers, and saves the project to the Annotation Editor s database. Metadata information of the selected annotation project Table 24 Delete Annotation Project functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Delete Annotation Project The registered user deletes an annotation project Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser. 1. In the Annotation Editor environment, a user selects from menu Projects 2. The Annotation Editor processes the request and shows a list with the user s available projects. 3. The user can select a project by clicking its name, and all the projects details are shown in a tabbed format. 4. By clicking Delete, the annotation project is removed from the Annotation Editor s database. Metadata and data information of the selected annotation project Public Page 104 of

106 Table 25 Save a curated corpus to OMTD Registry functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Save the Annotated Corpus back to OMTD Registry The registered user saves a curated corpus back to OMTD Registry OMTD Registry and Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser. 1. In the Annotation Editor environment, a user selects from menu Monitoring 2. The Annotation Editor processes the request and shows a list with the user s available projects, along with an overview of their progress. 3. The user can select the project and inspect the progress of the project per document in details. 4. If all documents annotated by the all assigned users and curated by the project owner, she can save it back to OMTD Registry by selecting Save. Corpus Metadata of an annotation editor s project Data and Metadata of an annotated corpus Public Page 105 of

Figure 70 High-fidelity prototype of Editing Annotation Project s details and Deleting functionalities 6.7.2 Annotation - Correction Curation The main functionality of the Annotation Editor is the ability to manually annotate a raw corpus or to correct a previously annotated one.

107 Figure 70 High-fidelity prototype of Editing Annotation Project s details and Deleting functionalities Annotation - Correction Curation The main functionality of the Annotation Editor is the ability to manually annotate a raw corpus or to correct a previously annotated one. Details for each functionality are illustrated in Table 26 Manual Annotation functionality to Table 28 Manual Curation functionality respectively, while high fidelity prototype overviews of these functionalities are shown in Figure 71 High-fidelity prototype of Manual Annotation functionality to Figure 73 High-fidelity prototype of Manual Annotation functionality respectively Annotation Table 26 Manual Annotation functionality ID Description Required Databases Actors Preconditions Manual Annotation A register user can annotate a document from a corpus of an annotated editor s project according to the annotation level defined in the project Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser. Main process 1. From the OMTD platform, the Registered User selects Actions -> Annotation Editor to access the Annotation Editor. 2. Within the Annotation Editor environment, the user selects from menu Annotation 3. The Annotation Editor processes the request and shows a list with the user s available projects, along with their respected documents for Public Page 106 of

Exceptions Related data model entities Processed Data Generated Data annotation. 4. The user can select the project and the document to work with by clicking their name and then Open. 5.

108 Exceptions Related data model entities Processed Data Generated Data annotation. 4. The user can select the project and the document to work with by clicking their name and then Open. 5. The Annotation Editor processes the request and shows the document to process. 6. The user can annotate the document with annotation level as defined by the project owner 7. When the annotation of the document is finished, the user declares it by selecting the Finish button. A document in an annotation project Annotations of a document Figure 71 High-fidelity prototype of Manual Annotation functionality Correction Table 27 Manual Correction functionality ID Description Manual Correction A register user can correct an annotated document from a corpus of an annotated editor s project according to the annotation level defined in the Public Page 107 of

109 project Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser. 1. Within the Annotation Editor environment, the user selects from menu Correction 2. The Annotation Editor processes the request and shows a list with the user s available projects, along with their respected documents for correction. 3. The user can select the project and the document to work with by clicking their name and then Open. 4. The Annotation Editor processes the request and shows the document to process. 5. The user can add new annotations or delete annotations of the annotated corpus. 6. When the correction of the document is finished, the user declares it by selecting the Finish button. A document in a correction project Annotations of a document Public Page 108 of

110 Figure 72 High-fidelity prototype of Manual Correction functionality Curation Table 28 Manual Curation functionality ID Description Manual Curation A register user who owns an Annotation editor s project can curate annotations of a document from users Public Page 109 of

111 Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Annotation Editor s database Registered User The user is the owner of the project and some documents have been annotated manually from (other) users 1. In the Annotation Editor environment, a user selects from menu Curation 2. The Annotation Editor processes the request and shows a list with the user s available projects, along with their respected documents for curation. 3. The user can select the project and the document to work with by clicking their name and then Open. 4. The Annotation Editor processes the request and shows the document for process. 5. The user can curate annotation from multiple sources, and add new annotations. 6. When the curation of the document is finished, the user declares it by selecting the Finish button. A document in a curation project Annotations of a document Public Page 110 of

Figure 73 High-fidelity prototype of Manual Annotation functionality 6.7.3 Monitoring As crowdsourcing annotation is a highly important task for text and data mining, OMTD Annotation Editor also offers easy monitoring of the progress of an annotation projects.

112 Figure 73 High-fidelity prototype of Manual Annotation functionality Monitoring As crowdsourcing annotation is a highly important task for text and data mining, OMTD Annotation Editor also offers easy monitoring of the progress of an annotation projects. Details of monitoring functionality are illustrated in Table 29 Monitoring functionality, while high fidelity prototype overview is shown in Figure 74 High-fidelity prototype of Annotation Project Monitoring functionality. Public Page 111 of

113 Table 29 Monitoring functionality ID Description Required Databases Actors Preconditions Main process Exceptions Related data model entities Processed Data Generated Data Monitoring A register user can monitor the progress of an Annotation Editor s project Annotation Editor s database Registered User Registered users access the OMTD web portal using a common web browser 1. In the Annotation Editor environment, the user selects from menu Monitoring 2. The Annotation Editor processes the request and shows a list with the user s available projects, along with an overview of their progress. 3. The user can select a project and inspect its process in details. Metadata of an annotation editor s project Public Page 112 of

114 Figure 74 High-fidelity prototype of Annotation Project Monitoring functionality Public Page 113 of

Platform UI Specification

Platform UI Specification November 25, 2016 Deliverable Code: D6.4 Version: 1.0 Final Dissemination level: Public This report presents the OpenMinTeD platform user interface design and implementation issues