Deliverable 1.3: A survey on ontology tools

Size: px

Start display at page:

Download "Deliverable 1.3: A survey on ontology tools"

Dustin Cannon
6 years ago
Views:

1 Deliverable 1.3: A survey on ontology tools OntoWeb Ontology-based information exchange for knowledge management and electronic commerce IST Date: 31 st May, 2002 Identifier Deliverable 1.3 Class Deliverable Version 1.0 Version date 31/05/2002 Status Final Distribution Public Responsible Partner UPM

2 IST Project IST OntoWeb OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce OntoWeb Consortium This document is part of a research project funded by the IST Programme of the Commission of the European Communities as project number IST Vrije Universiteit Amsterdam (VU)-Coordinator Faculty of Sciences, Division of Mathematics and Computer Science De Boelelaan 1081a 1081 HV Amsterdam, the Netherlands Fax and Answering machine: +31-(0) Mobil phone: +31-(0) Contactperson: Dieter Fensel dieter@cs.vu.nl The elaboration of this document has been coordinated by: Asunción Gómez Pérez Departamento de Inteligencia Artificial Facultad de Informática (Universidad Politécnica de Madrid) Campus de Montegancedo s/n Boadilla del Monte. Madrid. Spain Phone: ; Fax: asun@fi.upm.es This document has been developed jointly by partners involved in the OntoWeb workpackage 1 and in the Special Interest Group on Enterprise Standard Ontology Environments. The contributors for each section are presented below in alphabetical order: Chapter 1: Chapter 2: Chapter 3: Oscar Corcho, Mariano Fernández-López, Asunción Gómez-Pérez (UPM) Editor: Asunción Gómez-Pérez (UPM) Contributors: Jürgen Angele (Ontoprise) Sean Bechhofer (UM) Oscar Corcho (UPM) John Domingue (KMI) Alain Légér (FT&RD) Michele Missikoff (IASI-CNR) Enrico Motta (KMI) Mark Musen (SMI - Stanford) Natalya F. Noy (SMI - Stanford) York Sure (AIFB) Francesco Taglino (IASI-CNR) Editor: Mariano Fernández-López (UPM) Contributors: Asunción Gómez-Pérez (UPM) Deborah McGuiness (KSL - Stanford) Natalya F. Noy (SMI - Stanford) José A. Ramos (UPM) Gerd Stumme (AIFB) Chapter 4: Editor: Jürgen Angele (Ontoprise), York Sure (AIFB) 2/2

3 IST Project IST OntoWeb OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce Contributors: Yannick Bouillon (FT) Oscar Corcho (UPM) Mariano Fernández-López (UPM) Asunción Gómez-Pérez (UPM) Chapter 5: Editor: Arthur Stutt (KMI) Contributors: Oscar Corcho (UPM) Siegfried Handschuh (AIFB) Angel López (UPM) Mika Maier-Collin (Ontoprise) Enrico Motta (KMI) Chapter 6: Editor: Vassilis Christophides (FORTH) Contributors Dimitris Plexousakis (FORTH) Aimilia Magkanaraki (FORTH) Ta Tuan Ahn (FORTH) Grigoris Karvounarakis (FORTH) Chapter 7: Oscar Corcho, Mariano Fernández-López, Asunción Gómez-Pérez (UPM) 3/3

4 IST Project IST OntoWeb OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce Revision Information Revision date Version Changes Table of Content proposal UPM distributes first version on ontology development tools SMI provides updated descriptions of Protégé UPM distributes first version on integrating and merging tools Proposal of the design of the experiment for evaluating ontology development tools FT&RD delivers first version on evaluation tools OU provides updated description of WebOnto UPM provides WebODE description Ontoprise provides updated descriptions of: Ontoedit Free version and Ontoedit Professional version UPM distributes the test case experiment to be performed in sig3 concerning the evaluation of the ontology development tools. Task performed in conjunction with SMI at Stanford University First draft of Annotation Tools It starts the evaluation process of ontology development tools AIFB provides a first version on evaluation tools Forth sends a second version of ontology storage and querying evaluation report Major reedition of chapter University of Manchester provides GALEN ontology development tools KMi sends the last update of chapter Ontoprise: chapter 4 sent with new additions: OntoClean and ONE-T UPM: Integration of different chapters UPM: Introduction and conclusions added Updated contributors list SymOntoX added to chapter 2 LinkFactory and Apollo added to chapter 2 Chapter 5 (annotation tools) updated Other minor comments corrected Proof reading Final additions. Table of contents regenerated. 4/4

5 IST Project IST OntoWeb OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce 5/5

6 Table of contents OntoWeb Consortium...2 Revision Information...4 Table of contents...6 Executive summary Introduction Ontology building tools Introduction Evaluation framework used to compare the tools Ontology development tools Apollo LinkFactory OILEd OntoEdit Free and Professional versions Ontolingua Server OntoSaurus OpenKnoME Protégé SymOntoX WebODE WebOnto Comparison of tools against the evaluation framework Experiment with ontology development tools NL description on the travelling domain Conclusions and main recommendations URLs References Ontology merge and integration tools Introduction Evaluation framework used to compare the tools Description of the ontology merge and integration tools Chimaera FCA-Merge A Method for Bottom-Up Merging of Ontologies PROMPT ODEMerge Comparison of tools against the evaluation framework Conclusions and main recommendations URLs Ontology evaluation tools /6

7 4.1 Introduction Evaluation framework Implementations of the framework Overview OntoAnalyser OntoGenerator OntoClean in WebODE ONE-T Comparison of the implementations against the evaluation framework Related Work Conclusions References Ontology based annotation tools Introduction Framework used to describe and compare the tools Ontology annotation tools AeroDAML COHSE MnM OntoAnnotate OntoMat-Annotizer SHOE Knowledge Annotator Summary using the above framework Conclusions and main recommendations URLs Ontology storage and querying Introduction Evaluation framework of Query Languages and Storage Tools Description of ontology query languages and tools Ontology Query Languages ICS-FORTH RQL ILRT SquishQL Intellidimension RDFQL RDFPath VERSA RDF Query Language TRIPLE DAML+OIL Query Language Topic Maps Query Language Ontopia Tolog Ontology Storing and Querying Tools /7

8 ICS-FORTH RDFSuite Sesame Inkling rdfdb RDFStore Extensible Open RDF (EOR) Redland Jena RDF Gateway TRIPLE KAON Tool Suite Cerebra Empolis K Ontopia Knowledge Suite Comparison of Query Languages and Storage Tools Conclusions and main recommendations References: Conclusions /8

9 9/9

10 Executive summary This deliverable presents a survey of the most relevant ontology tools and semantic web technology available in our community. This survey is divided into several sections, which group different kinds of ontology tools, namely: ontology development, ontology merge, ontology evaluation, ontology-based annotation and ontology storage and querying. The document is homogeneous and all sections are described using the same pattern. Each section includes: a) Introduction b) Evaluation framework used to compare the tools on the section c) Short description of the tools to be compared d) Comparison of tools against the evaluation framework e) Conclusions and main recommendations f) Future works This survey has been performed in conjunction between partners of the Ontoweb workpackage 1 and participants of the Special Interest Group (SIG) on Enterprise-Standard Ontology Environments ( Each section of this document has been assigned to a working group in the SIG, which is leaded by an OntoWeb WP1 partner. Related links EON: Evaluation of Ontology-based Tools: OntoWeb-SIG3 Workshop at the 13th International Conference on Knowledge Engineering and Knowledge Management EKAW Special Issue on Ontology Tools at the International Journal of Human-Computer Studies (IJHCS). Call for papers: 10/10

11 11/11

12 1 Introduction In the last years, a high number of environments for ontology construction and ontology use have appeared. Tool support is really important both for the ontology development process (ontology building, annotation, merge, etc.) and for the ontology usage in applications, such as electronic commerce, knowledge management, the Semantic Web, etc. When we are about to build an ontology, several basic questions arise related to the tools to be used: Which kinds of tools are the most convenient for building my ontology? Which kinds of tools give better support to the ontology development process? How are ontologies stored (in databases, in XML, in ASCII files) in the tool? Does the tool have an inference engine? Do tools have forward and backward translators to/from different ontology implementation languages? How can applications interoperate with ontology tools? How can the developed ontologies be used in real applications? How can I reuse other existing ontologies in the same domain? How can I merge two similar ontologies built for the same domain? How can we evaluate the quality of the developed ontology or other existing ontologies that I will reuse? What is the stability and maturity of an ontology tool? Is it possible to build (semi)automatically an ontology from a text in natural language? Which tools can I use for adding markup annotations to Semantic Web pages? Which tools can I use for querying information about an ontology? Which tools can I use for the Semantic Web? In this document, we will present the most relevant kinds of tools that will allow performing one or several of the aforementioned tasks. We have grouped them in the following clusters: Ontology development tools.- This group includes tools, environments and suites that can be used for building a new ontology from scratch or reusing existing ontologies. Apart from the common edition and browsing functionality, these tools usually include ontology documentation, ontology exportation and importation from different formats, graphical views of the ontologies built, ontology libraries, attached inference engines, etc. Ontology merge and integration tools.- These tools have appeared to solve the problem of merging or integrating different ontologies on the same domain. This need appears when two companies or organizations are merged together, or when it is necessary to obtain a better quality ontology from other existing ontologies in the same domain. Ontology evaluation tools.- They appear as support tools that ensure that both ontologies and their related technologies have a given level of quality. Quality assurance is extremely important to avoid problems in the integration of ontologies and ontology-based technology in industrial applications. For the future, this effort might also lead to standardized benchmarks and certifications. Ontology-based annotation tools.- These tools have been designed to allow users inserting and maintaining (semi)automatically ontology-based markups in Web pages. Most of these tools have appeared recently, along with the emergence of the Semantic Web. Most of them are already integrated in an ontology development environment. Ontology storage and querying tools.- These tools have been created to allow using and querying ontologies easily. Due to the wide acceptance and use of the Web as a platform for communicating knowledge, new languages for querying ontologies have appeared in this context. Ontology learning tools.- They are used to (semi)automatically derive ontologies from natural language texts. However, these tools will not be revised in this deliverable, since there will be a specific deliverable from OntoWeb WP1 (D1.5) concerning them. 12/12

13 2 Ontology building tools 2.1 Introduction In the last years, the number of tools for building ontologies developed both by the American and European communities is high. When a new ontology is going to be built, several basic questions arise related to the tools to be used: What tool(s) give support to the ontology development process? How are the ontologies stored (in databases or ASCII files)? Does the tool have an inference engine? Do tools have translators to different ontology languages? What is the quality of the translations? How can applications interoperate with ontology servers?, etc. This chapter answers such questions. It presents and compares the most important and used development tools that have appeared in the last years. We will first present in section 2.2, the main set of criteria used to compare different ontology building tools. In section 2.3, we will present in depth some of the tools that were already included in OntoWeb deliverable D1.1. Section 2.4 will compare all the tools against the evaluation framework. Section 2.5 will describe an experiment that will be performed on different ontology development tools, and we will conclude this study in section Evaluation framework used to compare the tools This section presents the set of criteria that will be used for comparing ontology development tools. We will divide it into the following groups: General description of the tools, which includes information about developers, releases and availability. Software architecture and tool evolution, which includes information about the tool architecture (standalone, client/server, n-tier application), how the tool can be extended with other functionalities/modules, how ontologies are stored (databases, text files, etc.) and if there is any backup management system. Interoperability with other ontology development tools and languages, which includes information about the interoperability capabilities of the tool. We will review the tool's interoperability with other ontology tools (for merge, annotation, storage, inferencing, etc.), as well as translations to and from ontology languages. Knowledge representation. We will present the KR paradigm underlying the knowledge model of the tool. It is very relevant in order to know what and how knowledge can be modeled in the tool. We will also analyze if the tool provides any language for building axioms. Inference services attached to the tool. We will analyze whether the tool has a built-in inference engine or it can use other external inference engines. We will also analyze if the tool performs contraint/consistency checkings, if it can automatically classify concepts in a concept taxonomy and if it is able to manage exceptions in taxonomies. Usabillity. We will analyse the existence of graphical editors for the creation of concept taxonomies and relations, the ability to prune these graphs and the possibility to perform zooms of parts of it. We will also analyze if the tool allows some kind of collaborative working and if it provides libraries of ontologies. 2.3 Ontology development tools In this section, we will try to provide a broad overview of some of the available tools and environments that can be used for the building of ontologies, either from scratch or reusing other existing ontologies. We will provide a brief description of each tool, presenting the group that has developed it, its main features and functionalities, its relationship with KR formalisms, etc. We will also provide its URL and bibliographic references (if they are available) for allowing readers to find more information about it Apollo Apollo is a user friendly ontology development application. The design was motivated by our experiences working with industrial partners who wished to use knowledge modelling techniques, but required an easy to use and understand syntax and environment. 13/13

A snapshot of Apollo is shown in figure 2.1. A hierarchical representation of ontologies is shown in the top left pane. The hierarchy of classes and instances is shown in the bottom left pane.

14 A snapshot of Apollo is shown in figure 2.1. A hierarchical representation of ontologies is shown in the top left pane. The hierarchy of classes and instances is shown in the bottom left pane. Once selected, a class or instance is shown in detail in the panes on the right hand side of the screen. The slots and values of a class or instance can then be added using a spreadsheet style interface. Figure 2.1: Apollo screenshot Apollo supports all the basic primitives of knowledge modelling: ontologies, classes, instances, functions and relations. Full consistency checking is done while editing, for example, detecting the use of undefined classes. Apollo has its own internal language for storing the ontologies, but can also export the ontology into different representation languages, as required by the user. Apollo is implemented in Java. URL: Contact information for developers: m.koss@open.ac.uk Relevant bibliographic references: Apollo User Guide LinkFactory LinKFactory is a formal ontology management system developed by Language & Computing nv, designed to build and manage very large and complex language-independent formal ontologies. The LinKFactory system consists of 2 major components: the LinKFactory Server, and the LinKFactory Workbench (client-side component), both being developed in JAVA. At the server side, LinKFactory stores the data in a relational database. Access to the database is abstracted away by a set of functions that are natural when dealing with ontology s: get-children, findpath, join concepts, get terms for concept X, etc. These functions are accessable to software clients through a standardized API that allows building applications on top of the semantic database without requiring intimate knowledge of the internal structure of the database. This component is capable of dealing with multiple concurrent users and is platform independent (Windows, Solaris, UNIX and Linux tested). The application requires an RMI registry (a sort of Domain Name Server for RMI servers) to be running in order for it to be able to register itself and for clients to be able to connect to the RMI server. The LinkFactory Workbench allows the user to browse and model several ontologies, and align them, as shown in figure 2.2. The workbench is a dynamic framework implemented through JAVA beans. Each bean has its own specific functionality and limited view to the underlying formal ontology, but combining a set of beans in the workbench can provide the user with a powerful tool to view and manage the data. Java beans' 14/14

examples are: Concept tree, Concept criteria and full definitions, Linktype tree, Criteria list, Term list, Search pane, Properties panel, Reverse relations, and many more.

15 examples are: Concept tree, Concept criteria and full definitions, Linktype tree, Criteria list, Term list, Search pane, Properties panel, Reverse relations, and many more. Each user can create multiple views using the beans available. These views are called Layouts. Each layout can consist of several frames on which the beans can be laid out. Creating a new layout or adding new frames to an existing layout are all simple actions the user can select from the menu. Each frame can be divided into several bean-spaces where beans can be placed. The user can select any of the beans available, and simply drag and drop them into the desired area of the workspace. Once the user has placed the desired beans in the layout, he can create links between the several beans, again, simply by means of drag and drop. Each bean has specific properties, which can be set at runtime. This approach allows different types of tasks to be performed using the optimal layout. Besides dynamic coupling in the LinkFactory Workbench, the Java beans can also be used outside this workspace, such that software-developers can integrate them as (static) components into their own programs. For modelling ontologies, several quality assurance mechanisms are built in: versioning, user tracking, user hierarchies, formal sanctioning with possibility to overrule, sibling-detection, linktype hierarchy, etc. From the knowledge representation and underlying reasoning point of view, LinkFactory has the following characteristics and possibilities: - fixed built-in ISA (formal subsumption), DISJOINT, and SAME-AS relationships - definable relationship hierarchy (multiple hierarchies) - specification of necessary and sufficient conditions for individual concept definitions - several constraint checking methods - autoclassification of new concepts on the basis of natural language terms as well as formal definitions - mechanisms to map and/or merge various ontologies - possible to analyse texts automatically and assign links to the ontology (see screen shot) Figure 2.2: LinkFactory workspace set up to assess the coverage of a given ontology on the basis of text documents. The workspace contains 5 beans that have been selected from the toolbar. On the left, 15/15

16 there is the VisualTeSSIbean that contains an original text in the upper half, and the automatically annotated text in the lower half. The VisualTeSSIbean is connected to a ConceptTree (right upper part), that itself is connected to a FullDefBean and a TranslateBean. Clicking on the identified term breast cancer in the annotated document resulted in the information being displayed in the connected beans. Also shown is a LinkTypeTreeBean, not connected to any other bean. URL: Contact information for developers: info@landc.be Relevant bibliographic references: Ceusters W. Formal terminology management for language based knowledge systems: resistance is futile. In Temmerman R. (ed) Trends in Special Language and Language Technology, Ceusters W, Martens P, Dhaen C, and Terzic B. LinkFactory: an Advanced Formal Ontology Management System. In Proceedings of Interactive Tools for Knowledge Capture, KCAP-2001, October 20, Victoria, ( Jackson B, Ceusters W. A novel approach to semantic indexing combining ontology-based semantic weights and in-document concept co-occurrences. In Baud R, Ruch P. (eds) EFMI Workshop on Natural Language Processing in Biomedical Applications, 8-9 March, 2002, Cyprus, OILEd OILEd is a graphical ontology editor developed by the University of Manchester that allows the user to build ontologies using DAML+OIL. The knowledge model of OILEd is based on that of DAML+OIL, although this is extended by the use of a frame-like presentation for modelling. Thus OILEd offers a familiar frame-like paradigm for modelling while still supporting the rich expressiveness of DAML+OIL where required. Classes are defined in terms of their superclasses and property restrictions, with additional axioms capturing further relationships such as disjointness. The expressive knowledge model allows the use of complex composite descriptions as role fillers. This is in contrast to many existing frame-based editors, where such anonymous frames must be named before they can be used as models. The main task that OILEd is targeted at is that of editing ontologies or schemas, as opposed to knowledge acquisition or the construction of large knowledge bases of instances. Although functionality is provided that allows the definition of individuals, this is primarily intended for the definition of nomimals, which are used in the DAML+OIL one-of construction. Figure 2.3: OILEd screenshot 16/16

17 A key aspect of OILEd s behaviour is the use of the FaCT reasoner [Horrocks et al, 99] to classify ontologies and check consistency via a translation from DAML+OIL to the SHIQ description logic. This allows the user to describe their ontology classes and have the reasoner determine the appropriate place in the hierarchy for the definition. Figure 2.3 shows the situation where a concept definition has been determined to be unsatisfiable. The DAML+OIL RDF Schema (March 2001) 1 is used for loading and storing ontologies. In addition, the tool will read and write concept hierarchies in pure RDF and will render ontology definitions as HTML for browsing and as SHIQ for later classification by the FaCT reasoner. Concept hierarchies can also be rendered in formats readable by AT&T's dotty tool 2. OILEd version 3.4 is implemented in Java and is freely available from the OILEd web site, although registration of an address is required for download. Further information and relevant publications are also available at the web site. URL: Contact information for developers: Sean Bechhofer, seanb@cs.man.ac.uk. Questions relating to OILEd can also be sent to: oil-help@cs.man.ac.uk Relevant bibliographic references: Sean Bechhofer, Ian Horrocks, Carole Goble, Robert Stevens. OILEd: a Reason-able Ontology Editor for the Semantic Web. Proceedings of KI2001, Joint German/Austrian conference on Artificial Intelligence, September 19-21, Vienna. Springer-Verlag LNAI Vol. 2174, pp Horrocks, U. Sattler, S. Tobies. Practical reasoning for expressive description logics. 6th International Conference on Logic for Programming and Automated Reasoning (LPAR'99) (LNAI, Springer-Verlag, 1999) OntoEdit Free and Professional versions OntoEdit is an Ontology Engineering Environment supporting the development and maintenance of ontologies by using graphical means. OntoEdit is built on top of a powerful internal ontology model. This paradigm supports representation-language neutral modelling as much as possible for concepts, relations and axioms. Several graphical views onto the structures contained in the ontology support modelling the different phases of the ontology engineering cycle. The tool allows the user to edit a hierarchy of concepts or classes (figure 2.4). These concepts may be abstract or concrete, which indicates whether or not it is allowed to make direct instances of the concept. A concept may have several names, which essentially is a way to define synonyms for that concept. The tool allows similar to the well-known copy-and-paste functionality the reorganizing of concepts within the hierarchy. The tool is based on a flexible plugin framework. Firstly this easily allows to extend functionality in a modularized way. The plugin interface is open to third parties which enables users to extend OntoEdit easily by additionally needed functionalities. Secondly, having a set of plugins available like e.g. a domain lexicon, an inferencing plugin and several export and import plugins, this allows for user-friendly customization to adapt the tool to different usage scenarios. All OntoEdit versions are available in a free and a professional version. The professional versions typically include an additional set of plugins, e.g. the collaborative environment and the inferencing capabilities (cf. OntoEdit Professional Version). Currently the version 2.0 is available, 2.5 is scheduled for and 3.0 is scheduled for The Professional Version of OntoEdit contains additionally to the free version several plugins. Beside others the functionality is extended by (i) an inferencing plugin for consistency checking, classification and execution of rules, (ii) collaborative engineering of ontologies and (iii) an ontology server for administration of ontology libraries, collaborative sharing of ontologies and as a persistent storage for ontologies. The professional OntoEdit version 2.0 and 2.5 include (i) and the version 3.0 puts (ii) and (iii) on top /17

Figure 2.4. OntoEdit screenshot URL: http://www.ontoprise.de/com/start_downlo.htm Contact information about the developers: Dirk Wenke, wenke@ontoprise.de Relevant bibliographic references: Y.

18 Figure 2.4. OntoEdit screenshot URL: Contact information about the developers: Dirk Wenke, Relevant bibliographic references: Y. Sure, M. Erdmann, J. Angele, S. Staab, R. Studer and D. Wenke. OntoEdit: Collaborative Ontology Engineering for the Semantic Web. In Proceedings of the International Semantic Web Conference 2002 (ISWC 2002), June , Sardinia, Italia. Y. Sure, S. Staab, J. Angele, D. Wenke and A. Maedche. OntoEdit: Guiding Ontology Development by Methodology and Inferencing. Submitted Siegfried Handschuh. Ontoplugins a flexible component framework. Technical report, University of Karlsruhe, May Ontolingua Server The Ontolingua Server is a set of tools and services that support the building of shared ontologies between distributed groups, and have been developed by the Knowledge Systems Laboratory (KSL) at Stanford University. The ontology server architecture provides access to a library of ontologies, translators to languages (Prolog, CORBA IDL, CLIPS, Loom, etc.) and an editor to create and browse ontologies (figure 2.5). Remote editors can browse and edit ontologies, and remote or local applications can access any of the ontologies in the ontology library using the OKBC (Open Knowledge Based Connectivity) protocol. URL: Contact information about the developers: Ontology-librarian@KSL.Stanford.edu Relevant bibliographic references: Farquhar, R. Fikes, J. Rice. The Ontolingua Server: A Tool for Collaborative Ontology Construction, Proceedings of the 10th Knowledge Acquisition for Knowledge-Based Systems Workshop, (Banff, Alberta, Canada 1996) /18

Figure 2.5: Ontolingua screenshot 2.3.6 OntoSaurus Ontosaurus has been developed by the Information Sciences Institute (ISI) at the University of South California.

19 Figure 2.5: Ontolingua screenshot OntoSaurus Ontosaurus has been developed by the Information Sciences Institute (ISI) at the University of South California. It consists of two modules: an ontology server, which uses Loom as its knowledge representation system, and an ontology browser server that dynamically crates html pages (including image and textual documentation) that displays the ontology hierarchy (figure 2.6). The ontology can be edited by HTML forms, and translators exist from LOOM to Ontolingua, KIF, KRSS and C++. Figure 2.6.: OntoSaurus screenshot URL: Contact information about the developers: Tom Russ, Relevant bibliographic references: 19/19

20 B. Swartout, P. Ramesh, K. Knight, T. Russ, Toward Distributed Use of Large-Scale Ontologies. Symposium on Ontological Engineering of AAAI. (Stanford, California, March, 1997) OpenKnoME The KnoME is a large suite of tools for the collaborative development of ontologies in the GRAIL concept modeling language [Rector et al, 97]. Tigger is one important part of this suite, developed for the rapid acquisition of knowledge from domain experts untrained in ontological engineering. Together, these tools are freely and openly available as OpenKnoME. They have been developed by the University of Manchester over several large medical and pharmaceutical ontology programmes, including GALEN ( ) and GALEN-IN-USE ( ), PRODIGY ( ). The KnoME is used to model in GRAIL, a distinctive language developed in Manchester for use within the GALEN programmes. It is related to description logics and conceptual graphs. The KnoME s knowledge model is heavily influenced by GRAIL. GRAIL is used for modelling conceptual ontologies: the KnoME is not used for collecting instance data. Specific features of GRAIL include: Refinement the coordination of transitive relations with subsumption. Sanctioning constraints that describe how categories can be put together with attributes to make new definitions, and thus specify what it is sensible and meaningful to say. Extrinsics attachment of non-definitional knowledge to the concept framework, allowing indexing of application specific information for use in default reasoning. Generation of natural languages representations of GRAIL using segment grammar Figure 2.7 The KnoME launcher, with knowledge browser and sanctions browser 20/20

21 The KnoME is not a stand-alone system: it communicates with a GALEN Terminology Server (TeS) via a well-defined API. GRAIL sources are converted to a compiled conceptual model. The TeS stores and maintains the conceptual model, using separate modules to provide different kinds of services: conceptual, linguistic, coding and other services. The API provides a sharp distinction between the ontology and clients using it. The ontology is therefore presented as a service, rather than as a data structure. Via this service, the KnoME may browse, explore, view, and quality control the ontology. As the ontology is presented as a service, delivery is not usually by export to some static form, but rather as a TeS that can be queried and used by other clients. Specific example of this includes delivery of user interfaces based on the underlying ontology, and support for reasoning services. There is, however, a configurable export tool, which currently supports static export to HTML and CLIPS. The KnoME has been used to coordinate ontology development across many EU medical terminology projects; within the UK NHS Drug Ontology project; to support the development of the HL7 health applications industry standard; and in other international and industrial settings. It contains a suite of tools for the management of sources from multiple authors across disparate sites; the versioning and audit of those sources; check out, check in, locking, and branching of sources. Sources are converted to a compiled form. Tigger was developed to allow the bulk take on of domain knowledge. Domain experts are trained in the use of an easy to use Intermediate Representation (IR). They author concepts in this using either a GUI tool or a simple word processor. The IR from several authors is then coordinated, integrated and translated to GRAIL sources using Tigger. This overcomes the knowledge acquisition bottleneck. OpenKnoME version 5.4 is implemented in Cincom s VisualWorks Smalltalk and is freely available from the topthing.com web site, although registration of an address is required for download. Figure 2.7 shows a screenshot of it. The download is provided with manuals, tutorials, a compiled version of the OpenGALEN model of medicine, and Smalltalk source code. URL: Contact information for developers: Angus Roberts, angus@cs.man.ac.uk. Questions relating to the tools can also be sent to: opengalen-help@topthing.com Relevant bibliographic references: Rogers J.E., Roberts A., Solomon W.D., van der Haring E, Wroe C.J., Zanstra P.E., Rector, A.L. (2001) GALEN Ten Years On: Tasks and Supporting tools Proceedings of MEDINFO2001, V. Patel et al. (Eds) IOS Press; Rector AL, Bechhofer SK, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL Concept Modelling Language for Medical Terminology. Artificial Intelligence in Medicine, Volume 9, Protégé-2000 Protégé-2000 is the latest tool in an established line of tools developed at Stanford University for knowledge acquisition. Protégé-2000 has thousands of users all over the world who use the system for projects ranging from modeling cancer-protocol guidelines to modeling nuclear-power stations. Protégé is freely available for download under the Mozilla open-source license. Protégé-2000 provides a graphical and interactive ontology-design and knowledge-base development environment. It helps knowledge engineers and domain experts to perform knowledge-management tasks. Ontology developers can access relevant information quickly whenever they need it, and can use direct manipulation to navigate and manage an ontology. Tree controls allow quick and simple navigation through a class hierarchy. Protégé uses forms as the interface for filling in slot values (Figure 2.8). The knowledge model of Protégé-2000 is OKBC-compatible. It includes support for classes and the class hierarchy with multiple inheritance; template and own slots; specification of pre-defined and arbitrary facets for slots, which include allowed values, cardinality restrictions, default values, and inverse slots; metaclasses and metaclass hierarchy. In addition to highly usable interface, two other important features distinguish Protégé-2000 from most ontology-editing environments: its scalability and extensibility. Developers have successfully employed Protégé-2000 to build and use ontologies consisting of 150,000 frames. Supporting knowledge bases with hundreds of thousands of frames involves two components: (1) 21/21

a database backend to store and query the data and (2) a caching mechanism to enable loading of new frames once the number of frames in memory has exceeded the memory limit.

22 a database backend to store and query the data and (2) a caching mechanism to enable loading of new frames once the number of frames in memory has exceeded the memory limit. One of the major advantages of the Protégé-2000 architecture is that the system is constructed in an open, modular fashion. Its component-based architecture enables system builders to add new functionality by creating appropriate plugins. The Protégé Plugin Library 3 contains contributions from developers all over the world. Most plugins fall into one of the three categories: (1) backends that enable users to store and import knowledge bases in various formats; (2) slot widgets, which are used to display and edit slot values or their combination in a domain-specific and task-specific ways, and (3) tab plugins, which are knowledge-based applications usually tightly linked with Protégé knowledge bases. Current backend plugins (and standard backends) include support for storing and importing ontologies in RDF Schema, XML files with a DTD, and XML Schema files. We have experimental support for OIL. The development of DAML+OIL support is in its final stages. Available slot widgets include user-interface components to display GIF images, as well as video and audio. A diagram widget (figure 2.9) allows developers to build elements of a knowledge base by drawing a diagram in which nodes and edges are themselves frames of particular types (distinguished by shape and color). The most popular type of plugins are tab plugins. Currently available tabs provide capabilities for advanced visualization, ontology merging and version management, inferencing, and so on. The OntoViz and Jambalaya tabs, for example, present different graphical views of a knowledge base, with the Jambalaya tab allowing interactive navigation, zooming in on particular elements in the structure, and different layouts of nodes in a graph to highlight connections between clusters of data. Figure 2.8: A Protégé-2000 screen for editing classes and slots and for entering instance information. The class hierarchy with multiple inheritance is shown in the left pane. Users can drag and drop classes to rearrange the hierarchy. The right pane shows detailed information for the selected class. It includes the slots describing instances of the class. The second window shows part of the form for an Editor instance The PAL tab provides support for the Protégé Axiom Language. PAL is a subset of KIF that enables users to express constraints on their data for which the frame formalism itself is not sufficiently expressive. The PAL inference engine then analyzes the data to tell the users which constraints the instances in the /22

23 knowledge base violate and how. The Flora tab (for F-logic) and the Jess tab provide access to reasoning engines developed elsewhere. The PROMPT tab provides an environment for managing multiple ontologies. Its components include tools for ontology merging that help the user to find similarities between source ontologies and to merge the ontologies; for ontology versioning, which automatically finds a structural diff between versions of an ontology; and for extracting semantically complete subparts of an ontology and rearranging frames in different linked ontologies. The UMLS and WordNet tabs enable users to import and integrate elements of the large on-line knowledge sources into their ontologies. Figure 2.9: A diagram widget in Protégé. Users can drag items from the palette on the right to create a diagram showing the relations between instances in an ontology. URL: Contact information about the developers: protege-help@smi.stanford.edu Relevant bibliographic references: N. F. Noy, M. Sintek, S. Decker, M. Crubezy, R. W. Fergerson, & M. A. Musen. Creating Semantic Web Contents with Protege IEEE Intelligent Systems 16(2):60-71, N. F. Noy, R. W. Fergerson, & M. A. Musen. The knowledge model of Protege-2000: Combining interoperability and flexibility. 12th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2000), Juan-les-Pins, France, M. A. Musen, R. W. Fergerson, W. E. Grosso, N. F. Noy, M. Crubezy, & J. H. Gennari. Component- Based Support for Building Knowledge-Acquisition Systems. Conference on Intelligent Information Processing (IIP 2000) of the International Federation for Information Processing World Computer Congress WCC 2000), Beijing, Grosso, W., Gennari, J.H., Fergerson, R. and Musen, M.A. (1998). When Knowledge Models Collide (How it Happens and What to Do). In: Proceedings of the Eleventh Banff Knowledge Acquisition for Knowledge-Bases Systems Workshop, Banff, Canada SymOntoX SymOntoX (Symbolic Ontology Manager XML savvy), is a software prototype for the management of domain ontologies. It has been developed by LEKS (Laboratory for Enterprise Knowledge and Systems), at IASI-CNR. A screenshot is shown in figure In SymOntoX, domain concepts and relations are modelled according to OPAL (Object, Process, and Actor modelling Language), a methodology for ontology representation developed by LEKS, at IASI- CNR. SymOntoX that is the successor of SymOntos, is currently being developed and experimented within the European Project Harmonise, that concerns the tourism domain (therefore, examples will be taken from this domain). 23/23

24 According to OPAL, concepts are organised by means of three primary modelling ideas: Actor, Processes, and Object. More precisely, we have: Actor any relevant entity of the domain that is able to activate or perform a process (e.g, Tourist, Travel Agency); Object a passive entity on which a process operates (e.g., Hotel, Flight); Process an activity aimed at the satisfaction of a goal (e.g., Making_a_reservation). Besides the above primary modelling ideas, OPAL proposes the following complementary modelling ideas: Information Component - a cluster of information pertaining to the information structure of an Actor or an Object (e.g., Flight_info, Hotel_address); Information Element - atomic information element that is part of an Information Component (e.g., Flight_price, Nr_of_rooms); Action - activity that represents a process component, which can be further decomposable (e.g., Room_Requesting); Elementary Action - activity that represents a process component that is not further decomposable (e.g., Cancel_reservation). Goal is a desired state of the affairs that an actor seeks to reach. (e.g., Go_vacation); State is a characteristic pattern of values that instance variables of an entity can assume. (e.g., Flight_full); Rule is an expression that is aimed at restraining the possible values of an instance of a concept (constraint rule) or that allow deriving new information (production rule). (e.g., Ticket purchase 30 days before departure); Figure 2.10: SymOntoX screenshot The above modelling ideas are necessary for defining (unary) concepts. According to OPAL, concepts are linked together by means of a number of ontological relations, Specialisation, Decomposition, Predication, Similarity and Relatedness. 24/24

25 SymOntoX has been conceived to be a service available on the Internet usable by using a common webbrowser. It is mainly based on XML (all the data are stored in an XML database) and Java technologies, to guarantee maximal flexibility, interoperability and platform-independence. Furthermore it has been developed as a three-tier architecture. SymOntoX is able to manage different ontologies, different typologies of users and different modes of use. A user can be registered as a User (with reading rights only), as a SuperUser (he/she can insert new concepts, but only as proposals) or as the Ontology Master (who is the responsible of the ontology. He/she also has the task to accept or refuse the proposals made the SuperUsers). The system can be used as a glossary (only the name and the natural language description of the concepts are shown), as a thesaurus (also the specialisation hierarchy and the similarity relation are shown), as an ontology system (all the relations are shown), or as a knowledge base (including also the concept instances). SymOntoX supports a form based graphical user interface for editing and viewing and a diagramming functionality for browsing the ontology content. Moreover, a set of Java APIs provides the interoperability and integration with other systems. Furthermore, an ontology validator ensures the consistence of the ontologies with the respect to the OPAL axioms. All the data (ontology, concepts instances and log information) are stored in an XML database URL: Contact information about the developers: symontox@iasi.rm.cnr.it. Users who wish to use SymOntoX for ontology development can get an account and password by ing to Francesco Taglino (taglino@iasi.rm.cnr.it). Relevant bibliographic references: Missikoff M., Velardi P., Navigli R.: The Usable Ontology: An Environment for Building and Assessing a Domain Ontology, Proceedings of the International Semantic Web Conference 2002 (ISWC2002), June 2002, Sardinia Italy WebODE WebODE [Arpírez et al., 2001] is an ontological engineering workbench that provides varied ontology related services and covers and gives support to most of the activities involved in the ontology development process and in the ontology usage. It is built on an application server basis, which provides high extensibility and usability by allowing the easy addition of new services and the use of existing services. WebODE ontologies are represented using a very expressive knowledge model, based on the reference set of intermediate representations of the METHONTOLOGY methodology [Fernández-López et al., 1999], which includes ontology components such as concepts (with instance and class attributes), partitions, adhoc binary relations, predefined relations (taxonomic and part-of ones), instances, axioms, rules, constants and bibliographic references. It also allows the importation of terms from other ontologies, through the use of imported terms. Ontologies in WebODE are stored in a relational database. Moreover, WebODE provides a well-defined service-oriented API for ontology access that makes easy the integration with other systems. Ontologies built with WebODE can be easily integrated with other systems by using its automatic exportation and importation services from and into XML, its translation services into and from varied ontology specification languages (currently, RDF(S), OIL, DAML+OIL, CARIN and FLogic), and its translation services to other languages and systems, such as Java and Jess. Ontology edition in the WebODE ontology editor is aided both by form based (figure 2.11) and graphical user interfaces, a user-defined-views manager, a consistency checker, an inference engine, an axiom builder and the documentation service. Two interesting and novel features of WebODE with respect to other ontology engineering tools are: instance sets, which allow to instantiate the same conceptual model for different scenarios, and conceptual views from the same conceptual model, which allow creating and storing different parts of the ontology, highlighting and/or customizing the visualization of the ontology for each user. The graphical user interface allows browsing all the relationships defined on the ontology as well as 25/25

26 graphical-pruning these views with respect to selected types of relationships. Mathematical properties such as reflexive, symmetric, etc. and other user-defined properties can be also attached to the "ad hoc" relationships. The collaborative edition of ontologies is ensured by a mechanism that allows users to establish the type of access of the ontologies developed, through the notion of groups of users. Synchronization mechanisms also exist that allow several users to edit the same ontology without errors. Constraint checking capabilities are also provided for type constraints, numerical values constraints, cardinality constraints and taxonomic consistency verification (i.e., common instances of disjoint classes, loops, etc.) Finally, WebODE s inference service has been developed in Ciao Prolog. A subset of the OKBC primitives has been defined in prolog for their use in this inference engine. Additionally, the WebODE Axiom Builder transforms first-order logic axioms and rules into Prolog, if possible, so that they can be used in it as well. URL: Figure WebODE screenshot Contact information about the developers: webode@fi.upm.es Relevant bibliographic references: Arpírez, J.C.; Corcho, O.; Fernández-López, M.; Gómez-Pérez, A. WebODE: a Scalable Workbench for Ontological Engineering. First International Conference on Knowledge Capture (KCAP01). Victoria. Canada. October, Fernández-López M, Gómez-Pérez A, Pazos A, Pazos J. Building a Chemical Ontology Using Methontology and the Ontology Design Environment. IEEE Intelligent Systems & their applications. January/February #4(1) PP WebOnto WebOnto is a tool developed by the Knowledge Media Institute (KMi) of the Open University (England). It supports the collaborative browsing, creation and editing of ontologies, which are represented in the knowledge modelling language OCML. 26/26

27 Its main features are: management of ontologies using a graphical interface (figure 2.12); the automatic generation of instance editing forms from class definitions, support for PSMs and tasks modelling; inspection of elements, taking into account the inheritance of properties and consistency checking; a full tell&ask interface, and support for collaborative work, by means of broadcast/receive and making annotations (using Tadzebao). The WebOnto server is a freely available service provided to the ontology engineering community. A library with over 100 ontologies is accessible through WebOnto and can be browsed with no restrictions on access. Figure 2.12: WebOnto screenshot URL: (description); (tool) Contact information about the developers: Users who wish to use WebOnto for ontology development can get an account and password by ing Dr. John Domingue (j.b.domingue@open.ac.uk). Relevant bibliographic references: J. Domingue, Tadzebao and Webonto: Discussing, Browsing and Editing Ontologies on the Web. In Proceedings of the Eleventh Knowledge Acquisition Workshop (KAW98, Banff, 1998). 2.4 Comparison of tools against the evaluation framework The comments concerning this section are based on tools that have been described above. In alphabetical order: Apollo, LinkFactory, OpenKnoME, OILEd, OntoEdit Free Version, OntoEdit Professional Version, Ontolingua, Ontosaurus, Protégé2000, SymOntoX, WebODE and WebOnto. An important aspect when analyzing a tool is its software architecture and tool evolution (table 2.2). We have included information about the necessary hardware and software platforms to use the tool, together with its architecture (standalone, client/server, n-tier application), extensibility, storage of the ontologies (databases, ACII files, etc.) and backup management. From this perspective, most of the tools are moving towards Java platforms, and most of them are moving to extensible architectures as well. Storage in databases is still a weak point of ontology tools, since just a few of them use databases for storing ontologies: LinkFactory, OntoEdit Professional Version, Protégé2000 and WebODE. The same applies to backup management functionality, which is just provided by OpenKnoME, SymOntoX, WebODE and WebOnto. Interoperability (table 2.3) with other ontology development tools, merging tools, information systems and databases, as well as translations to and from some ontology languages, is another important feature 27/27

28 in order to integrate ontologies in applications. Most of the new tools export and import to ad-hoc XML and other markup languages. However, there is not a comparative study about the quality of all these translators. Moreover, there are no empirical results about the possibility of exchanging ontologies between different tools and about the loose of knowledge in the translation processes. From the KR paradigm (table 2.4) point of view, there are two families of tools: description-logic based tools, such as OILEd, OntoSaurus and OpenKnoMe; and the rest of tools, which allow representing knowledge following a hybrid approach based on frames and first order logic. Additionally, Protégé2000 provides flexible modelling components like metaclasses. Concerning the methodology (table 2.4) that the tool gives support to, both versions of OntoEdit give support to the OntoKnowledge methodology, OpenKnoMe gives support to GALEN methodology, SymOntoX gives support to OPAL methodology and, finally, WebODE gives support to Methontology. However, none of the tools analyzed includes: project management facilities, ontology maintenance and they only provide a little support for ontology evaluation. Before selecting a tool, it is also important to know which inference services are attached to it (table 2.5). This includes: built-in and other inference engines, constraint and consistency checking mechanisms, automatic classifications and exception handling, among others. LinkFactory has its own inference engine, OILEd performs inferences using the FACT inference engine, OntoEdit Professional uses OntoBroker, Ontolingua uses ATP, Ontosaurus uses the Loom classifier, OpenKnoMe uses its own inference engine, Protégé-2000 uses PAL, SymOntoX uses its own inference engine, WebODE uses Ciao Prolog and WebOnto uses the OCML inference engine. Besides, WebODE and Ontosaurus provide evaluation facilities. LinkFactory, OILed, OntoSaurus and OpenKnome are the only ones performing automatic classifications, the last three ones because they are based on description logic languages. Finally, none of the tools provide exception-handling mechanisms. Related to the usability of tools (table 2.6), WebOnto has the most advanced features related to the cooperative and collaborative construction of ontologies. In general, more features are required in existing tools to ensure the successful collaborative building of ontologies. Finally, other usability aspects related to help system, edition & visualization, etc., should be improved in most of the tools. 28/28

29 Feature Apollo LinkFactory OILEd OntoEdit Free Developers KMI Language & University of (Open Computing nv Manchester University) Current release (Date) 1.0 Beta 3, (May2002) Availability Open source License on site or ASP May (Apr2002) 2.5 (May2002) 3.0 (Aug2002) Table 2.1: Tools' general description OntoEdit Professional Ontoprise Ontoprise KSL (Stanford University) 2.5 (May2002) 3.0 (Aug2002) Open Source Freeware Software license Ontolingua Ontosarus OpenKnoME Protégé 2000 ISI University of SMI (University of Manchester (Stanford Southern University) California) (Nov2001) Free Web access 1.9. (Mar2002) Open source and free Web access to evaluation version (Dec2001) SymOntoX WebODE WebOnto LEKS (IASI-CNR) Freeware Open Source Free Web access Ontology Group (UPM) (Mar2002) Software license and free Web access KMI (Open University) 2.3 (May2001) Free Web Access Table 2.2: Tools' architecture Feature Apollo Link OILEd OntoEdit OntoEdit Ontolingua Ontosarus OpenKnoME Protégé 2000 SymOntoX WebODE WebOnto Factory Free Professional Sw architecture Standalone 3-tier Standalone Standalone Standalone & Client/server Client/server Client/server Standalone 3-tier 3-tier Client/server client-server Extensibility Plugins Yes No Plugins Plugins None None None Plugins No Plugins No Ontology storage Files DBMS File File File Files Files File File XML DBMS (JDBC) File DBMS (v3.0) DBMS (JDBC) Backup management No No No No No No No Yes (audit. logs) No Yes Yes Yes 29/29

30 Table 2.3: Tools' interoperability Feature Apollo Link Factory With other No FastCode ontology tools TeSSI Imports from languages Exports to languages Apollo metalanguage OCML CLOS XML RDF(S) DAML+OIL XML RDF(S) DAML+OIL HTML OILEd FaCT RDF(S), OIL, DAML+OIL OIL RDF(S) DAML+OIL SHIQ dotty HTML OntoEdit Free OntoAnnotate Ontobroker OntoMat Semantic Miner XML RDF(S) Flogic DAML+OIL XML RDF(S) Flogic DAML+OIL OntoEdit Professional OntoAnnotate Ontobroker OntoMat Semantic Miner XML RDF(S) FLogic DAML+OIL XML RDF(S) Flogic DAML+OIL SQL-3 Ontolingua Ontosarus OpenKnoME Protégé 2000 Sym OntoX -- GCE PROMPT (GALEN CASE OKBC Environment) JESS SPET (Surgical FaCT Procedure Entry Tool) Chimaera CML Model Fragment Editor Equation Solver Data structures inspector Expressions Evaluator OKBC Ontolingua IDL KIF KIF 3.0 CLIPS CLIPS sentential format CML ATP CML rule engine EpiKit IDL KSL rule engine LOOM OKBC syntax PROLOG syntax LOOM IDL ONTO KIF C++ LOOM IDL ONTO KIF C++ GRAIL GALEN IR GRAIL CLIPS HTML GALEN IR XML, RDF(S), XML Schema XML, RDF(S), XML Schema, FLogic, CLIPS, Java HTML WebODE -- JESS PICSEL OILEd ODEMerge ODE-KM -- XML, RDF(S), CARIN RDF(S) XML, RDF(S) OIL DAML+OIL CARIN FLogic Prolog Jess Java HTML WebOnto PlanetOnto ScholOnto MnM OCML OCML Ontolingua GXL RDF(S) OIL Table 2.4: Tools' knowledge representation and methodological support Feature Apollo Link Factory KR paradigm Frames Frames+ of knowledge (OKBC) FOL model Axiom language Unrestricted Proprietary (restricted Methodological support OILEd DL (DAML+OIL) Yes (DAML+OIL) OntoEdit Free Frames + FOL Yes (FLogic) FOL) No Yes No Yes (Onto- Knowledge) OntoEdit Professional Frames + FOL Yes (FLogic) Yes (Onto- Knowledge) Ontolingua Ontosarus Open KnoME Frames + DL DL FOL (LOOM) (GRAIL) (Ontolingua) Yes (KIF) Yes (LOOM) Yes (GRAIL) No No Yes (GALEN) Protégé 2000 Frames + FOL + Metaclasses Yes (PAL) No Sym WebODE OntoX OPAL Frames + FOL OPAL Yes (OPAL) Yes (WAB) Yes (Methontology) WebOnto Frames + FOL Yes (OCML) No 30/30

31 Table 2.5: Tools' inference services Feature Apollo Link Factory OILEd OntoEdit Free OntoEdit Professional Ontolingua Ontosarus Open KnoME Protégé 2000 SymOntoX WebODE WebOnto Built-in No Yes Yes No Yes (OntoBroker) No Yes Yes Yes (PAL) Yes Yes (Prolog) Yes inference engine (FaCT) (Similarity Reasoner) Other attached No Yes No No No ATP Yes No Jess No Jess No inference engine (frozen ontologies) FaCT FLogic Constraint/Consistency Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes checking Automatic classifications No Yes Yes No No No Yes Yes No No No No Exception handling No No No No No No No No No No No No Table 2.6: Tools' usability Feature Apollo Link OILEd OntoEdit OntoEdit Ontolingua Ontosarus Open Protégé SymOntoX WebODE WebOnto Factory Free Professional KnoME 2000 Graphical taxonomy Yes Yes No No No Yes No No Yes Yes Yes Yes Graphical prunes Yes Yes No No No No No No Yes Yes Yes Yes (views) Zooms No Yes No No No No No No Yes No No No Collaborative working No Yes No No Yes Yes Yes Yes No Yes Yes Yes (consys) Ontology libraries Yes Yes Yes No Yes Yes No Yes Yes Yes No Yes 31/31

32 2.5 Experiment with ontology development tools This section includes the description of the experiment that will be performed on SIG3 on Enterprise- Standard Ontology Environments. For the experiments, we have selected a few criteria from the previous list: expressiveness of the knowledge model attached to the tool, usability, reasoning mechanisms, and scalability. Additional criteria could be the following: quality of the translations, exchange of ontologies, etc. The set of ontology development tools will be evaluated with the following tasks: Task 1. To evaluate expressiveness of the knowledge model attached to the tool, tool developers will develop a particular ontology from a NL description. To make the results comparable, we will limit what we want to represent. The description must be clear in order to get compatible and comparable results. We will allow to refine the conceptual models is some pieces of the knowledge is not initially represented. Task 2. To evaluate usability, we will perform cross evaluation between teams. Members of the team will represent the ontology from Task 1 in the tool (or tools) of other teams. To ensure uniformity, the same team members who represented the ontology in their own tool will represent it in the other tool(s). Then, we can compare the models performed in the cross evaluation with the model proposed by the developers. This experiment will be used for testing usability. The results of this experiment could be used as input for testing merging tools in chapter 3. Task 3. To evaluate reasoning mechanisms, we will elaborate a set of predetermined questions about the travel domain, using the ontologies as represented in the original tools as the basis for determining the answers. Task 4. Scalability. One of the evaluation parameters will be scalability of the tools. The developers will need to say what was the largest ontology represent in the tool (number of concepts, number of assertions). Some of the suggestions for large ontologies are: UNSPSC, UNSPSC can be used for testing how tools deal with big taxonomies (12000 concepts). UNSPSC can be generated in RDF from WebODE. WordNet taxonomy, which is available through an API. dmoz open directory project, there is an RDF dump available. The ontology developers must make available to the evaluators the largest existing ontology represented in the tool. Task 5. To evaluate simultaneously scalability, expressiveness and reasoning. CycKB could be used. Wrappers from CycDB to RDF are required. All the tools to be evaluated should be accessible for the rest of the teams, including a version of the commercial tools. Right now, the tools involved on the experiment are: OILEd (Manchester, UK) OntoEdit V2.5 professional (Ontoprise, Germany) Protégé2000 (Stanford, USA) WebODE (UPM, Spain) WebOnto (OU, UK) NL description on the travelling domain In this section we present a description of the travelling domain to be used for building ontologies using the above ontology development tools. 32/32

33 Let's consider that we are in charge of developing an application for our travel agent in New York, and that we have decided to make use of an ontology to represent explicitly the knowledge that will be used by it. We will focus to travelling and Lodging, but leisure time, cultural events, tours, etc., will be considered in further stages of our ontology. We know that when a client makes a trip, he chooses: transport and accommodation. Hence, we start by determining the means of transport that are currently available for a travel agency. We will have in our ontology the following ones: planes, trains, cars, ferries, motorbikes and ships. There are no other kinds of transport. From all of them, the travel agency is specially interested in flights, as it is the means of transport mostly used by its customers. In fact, customers are usually interested in the kind of planes that they will fly on: Is it a Boeing, or is it an Airbus? Furthermore, they are even interested in the specific model of the plane in which they will fly (a Boeing or a Boeing 777). We know that each model of transport belongs only to one kind of transportation (e.g., it s either a plane, or a bus, or a car, etc. For each flight, the agency knows: the arrival date, the departure date, the arrival city, the departure city, the arrival airport, the departure airport, the prices on first class, business class and economy class, the departure time and arrival time. Time and date will be considered as absolute date. As for the destinations of customers' travels, they are diverse. Some customers ask for trips to the Statue of Liberty in New York; other customers ask for trips to Washington, San Francisco, Seattle, etc. There are customers interested in visiting Europe: the most common destinations are London, Paris (either the city or Disneyland Paris) and Madrid. Others are interested in more places, such as Cairo (Egypt). We know that the client can use the following transport to move inside the city: underground, city buses, taxis, and rental cars. Concerning hotels, the agency recommends in all the cities: hotels, and Bed and Breakfasts. Hotels rank from 1 star hotels to 5 star hotels and each hotel belongs to one of these five categories. For all of them, the agency knows their facilities: address, telephone number, URL, capacity, number of rooms, available rooms, descriptions, dogs allowed, distance to the beach, distance to skiing, etc. The agency also knows the facilities of the rooms: number of beds, rates, TV available, Internet connection, etc. Once we have defined what are the main elements in our domain, we can go further and try to represent some common sense constraints and deductions that can be performed with them. For instance, we know that it is not possible to go from America to Europe by train, car, bike nor motorbike. Having this information in our system will avoid it to search for possible itineraries using these means of transport when a customer wants to travel to Europe. Another example of this kind of constraint may be related to the distance between the origin and destination of our trip and the available means of transport. If distance between two cities is between 400 and 800 miles, and there is no airport close to one of them, the customer will prefer going by car or by train. The customer also prefers to go by car or train if he hates travel by plane. Distances can be either in km or miles. Finally, we want to represent knowledge about a concrete trip. John is travelling from Madrid to NY on April 5 th, 2002 to see the Statue of Liberty and continuing on to Washington on April 11 th. He plans to return to Madrid on April 15 th. He has selected two hotels belonging to the Holiday Inn chain in New York and Washington. 2.6 Conclusions and main recommendations In summary, a lot of "similar" ontology development tools exist for the building of ontologies, but neither do they interoperate nor do they cover all the activities of the ontology life cycle (just design and implementation). The lack of interoperability between all these tools provokes important problems when integrating an ontology into the ontology library of a different tool, or if two ontologies built using different ontology tools or languages are integrated using merging tools. Consequently, future work should be driven towards the creation of a common workbench for ontology developers, as presented in Gómez-Pérez (2001), which facilitates: ontology development construction 4 provides the Boeing717's technical description 33/33

34 during the whole ontology life cycle, ontology management advanced techniques for visualizing the ontology content, etc. This ontology development workbench should be also accompanied by a set of ontology middleware services that support the use of ontologies in other systems. Some of these services are: software that helps to locate the most appropriate ontology for a given application, formal metrics that compare the semantic similarity and semantic distance between terms of the same or different ontologies, software that allows incremental, consistent and selective upgrades of the ontology which is being used by a given application, remote access to the ontology library system, software that facilities the integration of the ontology with legacy systems and databases, etc, Finally, a wide transfer of this technology into companies, with the subsequent development of a large number of ontology-based applications in the Semantic Web context, will be achieved by the creation of ontology application development suites, which will allow the rapid development and integration of existing and future applications in a component based basis. 2.7 URLs Apollo: LinkFactory: OntoEdit Free and Professional: OILEd: Ontolingua: Ontosaurus: OpenKnoME: Protégé 2000: SymOntoX: WebODE: WebOnto: References Duineveld, A., Studer, R., Weiden, M, Kenepa, B., Benjamis, R. WonderTools? A comparative study of ontological engineering tools. Proceedings of KAW99. Banff. Canada Gómez-Pérez, A. A proposal of infrastructural needs on the framework of the semantic web for ontology construction and use. Programme Consultation Meeting (PCM-9) on Knowledge Technologies. European Commission. April, /34

35 3 Ontology merge and integration tools 3.1 Introduction Ontology merge has become a key point for the last years. On the one hand, ontology merge in design time is very important, since the merger of companies or organizations in general can lead to a merge of their ontologies. Even you can have the necessity of merging several ontologies to have another with better quality. On the other hand, ontology merging in run-time can be also crucial. In fact, ontologies have become increasingly common on the World-Wide Web where they provide semantics for annotations in Web pages (Noy et al., 2001). Moreover, the heterogeneity of information in the Web can provokes to deal with different ontologies in similar domains. Such diversity must coexist with the interaction between systems. An option to make compatible the diversity and the generality is to establish mappings between ontologies, and to merge them in run-time (Mena et al.; 2001). As a consequence of the situation shown in the former paragraphs, several ontology merging tools have appeared. Concerning the structure of this document, we will first present, in section two, the main set of criteria used to compare different merging tools. In section three, we will briefly present the tools. In section four we will compare all the tools against the evaluation framework, and we will conclude this study in section five. References (Mena et al.; 2001) E. Mena and I. Illarramendi, Ontology-Based Query Processing for Global Information Systems, Kluwer Academic Publishers, ISBN , pp. 215, June (Noy et al.; 2001) Noy, N.F. and Musen, M.A. Anchor-PROMPT: Using Non-Local Context for Semantic Matching. In: Workshop on Ontologies and Information Sharing at the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001). Seattle, WA, Evaluation framework used to compare the tools This section presents an initial set of criteria that could be used for comparing ontology merging tools. To elaborate this framework, the following steps were carried out: 1) The coordinator of this chapter sent a first proposal of evaluation framework for the rest of participants in this chapter 2. A source of inspiration was the framework proposed in chapter 1 for evaluating ontology development tools, although other features for merging tools were added. 2) Who wanted to participate gave suggestions about framework. 3) After a discussion through several s, an agreed evaluation framework for merging tools was attained. The features to be evaluated are: General description. Such description is important to know the context of the tool. It includes name, developers, current release & date and pricing policy (open source, shareware, licenses, etc.). Architecture and evolution. It is important to know what the technological requirements are, and how the clients have access to new versions and improvements. The following features are included: Is the ontology merging tool integrated in some development ontology tool? Which one? Hw/Sw platform. The possible values are: operating system, the kind of machine (PC, Macintosh, Sun, etc.) Does the tool need to be installed locally? 35/35

36 Failure tolerance. The possible values are low, medium, high. To fill in this characteristics, the following considerations must be taken into account: 1) This criterion will be filled in only if a cross evaluation is carried out. 2) The value must have an attached justification. Backup management? That is, if the merging tool is integrated in other tool, a value for this criterion could be: yes, performed by the host tool Stability. The possible values: low, medium, high. The same considerations of failure tolerance are taken into account here. Efficiency. Possible values: low, medium, high. The same considerations of failure tolerance are taken into account here. Tool updates policy. That is, how often does appear a new version of the tool?, how much money does cost the updating? How are the old clients notified about the new versions?. Information used during the merge process (what information does the tool use for automated merging -or suggestions). The merging process can be much more efficient if additional information such as some linguistic resource is used. The features to be considered are: Electronic dictionaries, thesauri, etc. The possible values are: Yes, but this is not necessary for the process, yes, this is necessary for the process, or no Lexicons. The possible values are: Yes, but this is not necessary for the process, yes, this is necessary for the process, or no Concept definitions and slot values. The possible values are: Yes, but this is not necessary for the process, yes, this is necessary for the process, or no Graph structure. The possible values are: this is necessary for the process, or no Yes, but this is not necessary for the process, yes, Instances of concepts. The possible values are: Yes, but this is not necessary for the process, yes, this is necessary for the process, or no Input from the user. The possible values are: Yes, but this is not necessary for the process, yes, this is necessary for the process, or no Interoperability. It is important because key activities can be performed by other non-merging tools: transformation of formats, evaluation, etc. The features are: Is interoperability possible with other ontology tools or other information system?. For example, can the merging tool take ontologies of a remote development tool? Can the tool merge ontologies expressed in different languages? Which ones? Work mode. It is important to know what the role of the user is during the merging process. The considered feature is: In which mode does the tool work? Interactive throughout, completely automated, or automated with subsequent fine-tuning by the user Management of different versions of ontologies. A change in a source ontology changes the resulting merged ontology. The considered features are: 36/36

37 Takes the tool advantage of the merge of former versions of the ontologies. For example, if the ontology O is the merge of O1-v.1 and O2-v.1, and we develop O1-v.2: do we have to completely repeat the process of merging O1 and O2, or we can take advantage of the previous result? Warnings about changes in source ontologies? Does the system warn us saying that O is not the merge of the updated versions of O1, O2 On? Components that the tool allows merging. That is, what parts of the ontologies can they be merged? What parts are they loosen? This is very important. The considered components are: Concepts, considering own slots, template slots, taxonomies, concepts by themselves, relations, partitions and/or decompositions, partitions, decompositions, relations & functions, and the arity of relations and functions. Axioms. The following questions are relevant: can the tool integrate the set of AXIOMS of the ontologies to be merged? can the tool integrate the set of RULES of the ontologies to be merged? Instances. It includes instances of concepts, instances of relations (facts) and claims. Suggestions provided by the tool (does the tool provide suggestion to merge?). When the merging process is interactive with the user, the system can suggest to him(her) what components can be merged in the next step. The considered components are: Concepts, considering own slots, template slots, taxonomies, concepts by themselves, relations, partitions and/or decompositions, partitions, decompositions, relations & functions, and the arity of relations and functions. Sets of axioms. The system can suggest that a set of axioms can be merged. Sets of rules. Instances. It includes instances of concepts, instances of relations (facts) and claims. Conflicts detected by the tool. This feature considers if the system can detect repeated global names or redundant structures in the merged ontology. Support of some methodology and techniques. It is important to know whether the tool has a methodological support or not. It is also important to know what techniques are supported by the tool. The particular features are: Does the tool support any methodology? ( No or yes and the name of the methodology). Other techniques. Probability (possible values: Yes/No), machine learning (possible values: Yes/No). Help system. This is a key aspect a every tool, about all for beginners. The aspects to be considered are: Documentation. Tutorial on the methodology. It makes sense if the tool provides support to some methodology Help on user interface. The possible values are general help, tutorial help, guided tour, etc. Context help. Edition & visualization. It very important for the usability of the tool. The considered features are: 37/37

38 View step by step of the process. The possible values are graphical, non-graphical, none. Simultaneous view of the ontologies to be merged. Graphical prunes (views) of the ontologies to be merged. Zooms. Hide/show information. Experience using the tool. It is an important aspect to have an idea about the confidence about the tool. The considered features are: Merged ontologies and domains. Projects where the merge has been done. Applications where the unified ontologies have been used. 3.3 Description of the ontology merge and integration tools Chimaera Chimaera is a merging and diagnostic web-based browser ontology environment. Its design and implementation is based on our experience developing other user interfaces for knowledge applications such as the Ontolingua ontology development environment [Farquhar, et al, 1997], the Stanford CML editor [Iwasaki, et al, 1997], the Stanford JAVA Ontology Tool (JOT), the Intraspect knowledge server [Intraspect 1999], two web interfaces [McGuinness, et. al., 1995; Welty, 1996] for the CLASSIC knowledge representation system [Borgida, et. al, 1989], and a collaborative environment for building ontologies for FindUR [McGuinness, 1998]. Chimaera is built on a platform that handles any OKBCcompliant [Chaudhri, et. al, 1998] representation system. Chimaera accepts over 15 designated input format choices (such as ANSI KIF, Ontolingua, Protégé, CLASSIC, XOL, etc.) as well as any other OKBC-compliant form. It will soon be compliant with other emerging standards such as RDF and DAML. Chimaera contains a simple editing environment in the tool and also allows the user to use the full Ontolingua editor/browser environment for more extensive editing. Ontolingua is not a requirement however; other editors could be used in its place. It facilitates merging by allowing users to upload existing ontologies into a new workspace (or into an existing ontology). Figure 3.1 shows the result of someone loading in two ontologies (Test1 and Test2) and then choosing the name resolution mode for the ontologies. Chimaera will suggest potential merging candidates based on a number of properties. It generates a name resolution list that may be used as a guide through the merging task. The displayed option in the name resolution list in figure 3.1 below shows a suggestion to merge Mammal and Mammalia (since they had similar names). The user sees a display of the places where the two terms appear in the hierarchy (with only the connected portions of the hierarchy displayed). The user may browse the hierarchy in more detail by doing things like expanding subclasses (both Mammal and Mammalia are closed as represented by the closed triangle in figure 3.1). The user may also view the definitions of the terms and, within Ontolingua, the user may also obtain the results of similarly and difference structural comparisons of the definitions as well. The user may then choose to merge the terms with a simple menu choice from the class menu. Chimaera allows the user to choose the level of vigor with which it suggests merging candidates. Higher settings, for example will look for things like possible acronym expansion (which was extremely valuable in our use of Chimaera on some government knowledge bases). 38/38

39 Figure 3.1. Chimæra in name resolution mode suggesting a merge of Mammal and Mammalia Chimaera also supports a taxonomy resolution mode. It looks for a number of syntactic term relationships (such as <X-Y> and <Y> since the two are usually subclass related). When attached to a classifier, it can look for semantic subsumption relationships as well Chimaera includes an analysis capability that allows users to run a diagnostic suite of tests selectively or in their entirety. The output is displayed as an interactive log that allows users to see the results of the tests and also to explore the results. The tests include incompleteness tests, syntactic checks, taxonomic analysis, and semantic checks. We built this system to provide collaborators with varying training essentially a todo list containing updates that would likely need to be done before the ontologies would be of the most use to us. The list contains things such as terms that are used but that are not defined, to terms that have contradictory ranges, to cycles detected in the ontology definitions. We are extending the system to include a rule language that allows users to specify additional tests that our environment should include in its diagnostic tool suite so that users may customize the diagnostics to their particular environment. Chimaera was used in the High Performance Knowledge Base project to analyze incoming ontologies. It is also being used and/or evaluated by companies including VerticalNet and Cisco. More information is available from [McGuinness et al., 2000], or from the web site 5, which also includes links to a tutorial and a movie demonstration. It is licensable for use. References. A. Borgida, R.J. Brachman, D.L. McGuinness, and L.A. Resnick; CLASSIC: A Structural Data Model for Objects, SIGMOD, Oregon, 1989 V. Chaudhri, A. Farquhar, R. Fikes, P. Karp, and J. Rice; OKBC: A Programmatic Foundation for Knowledge Base Interoperability; AAAI-98. A. Farquhar, R. Fikes, and J. Rice; The Ontolingua Server: a Tool for Collaborative Ontology Construction; Intl. Journal of Human-Computer Studies 46, Intraspect Knowledge Server, Intraspect Corp., ( Y. Iwasaki, A. Farquhar, R. Fikes, & J. Rice; A Web-based Compositional Modeling System for Sharing of Physical Knowledge. Morgan Kaufmann, Nagoya, Japan, D.L. McGuinness; Ontological Issues for Knowledge-Enhanced Search; Proceedings of Formal Ontology in Information Systems, June Also in Frontiers in Artificial Intelligence and Applications, IOS- Press, Washington, DC, /39

40 D.L. McGuinness, R. Fikes, J. Rice, and S. Wilder. An Environment for Merging and Testing Large Ontologies. Proceedings of Knowledge Representation D.L. McGuinness, L.A. Resnick, and C. Isbell; Description Logic in Practice: A CLASSIC: Application; IJCAI, United Nations Standard Product and Services Classification (UNSPSC) Code organization. C. Welty; An HTML Interface for CLASSIC; Proceedings of the 1996 International Workshop on Description Logics; AAAI Press; November, FCA-Merge A Method for Bottom-Up Merging of Ontologies FCA-Merge is a method for merging ontologies, which follows a bottom-up approach offering a global structural description of the merging process. For the source ontologies, it extracts instances from a given set of domain-specific text documents by applying natural language processing techniques. Based on the extracted instances we apply mathematically founded techniques taken from Formal Concept Analysis [Wi82,GW99] to derive a lattice of concepts as a structural result of FCA-Merge. The produced result is explored and transformed to the merged ontology by the ontology engineer. This method is based on application-specific instances of the two given ontologies O 1 and O 2 that are to be merged. The overall process of merging two ontologies is depicted in figure 3.2 and consists of three steps, namely (i) instance extraction and computing of two formal contexts K 1 and K 2, (ii) the FCA- Merge core algorithm that derives a common context and computes a concept lattice, and (iii) the interactive generation of the final merged ontology based on the concept lattice. Instance Extraction Figure 3.2. FCA-merge process The extraction of instances from text documents circumvents the problem that in most applications there are no objects which are simultaneously instances of the source ontologies, and which could be used as a basis for identifying similar concepts. This method takes as input data the two ontologies and a set D of natural language documents. The documents have to be relevant to both ontologies, so that the documents are described by the concepts contained in the ontology. The documents may be taken from the target application, which requires the final merged ontology. From the documents in D, we extract instances. This automatic knowledge acquisition step returns, for each ontology, a formal context indicating which ontology concepts appear in which documents. The extraction of the instances from documents is necessary because there are usually no instances which are already classified by both ontologies. However, if this situation is given, one can skip the first step and use the classification of the instances directly as input for the two formal contexts. The FCA Core Algorithm The second step of our ontology merging approach comprises the FCA-Merge core algorithm. The core algorithm merges the two contexts and computes a concept lattice from the merged context using FCA 40/40

41 techniques. More precisely, it computes a pruned concept lattice which has the same degree of detail as the two source ontologies. Deriving the Merged Ontology Instance extraction and the FCA-Merge core algorithm are fully automatic. The final step of deriving the merged ontology from the concept lattice requires human interaction. Based on the pruned concept lattice and the sets of relation names R 1 and R 2, the ontology engineer creates the concepts and relations of the target ontology. We offer graphical means of the ontology-engineering environment OntoEdit for supporting this process. Assumptions For obtaining good results, a few assumptions have to be met by the input data: Firstly, the documents have to be relevant to each of the source ontologies. A document from which no instance is extracted for each source ontology can be neglected for our task. Secondly, the documents have to cover all concepts from the source ontologies. Concepts that are not covered have to be treated manually after our merging procedure (or the set of documents has to be expanded). And last but not least, the documents must separate the concepts well enough. If two concepts that are considered as different always appear in the same documents, FCA-Merge will map them to the same concept in the target ontology (unless this decision is overruled by the knowledge engineer). When this situation appears too often, the knowledge engineer might want to add more documents that further separate the concepts. References [GW99] B. Ganter, R. Wille: Formal Concept Analysis: Mathematical foundations. Springer, Berlin- Heidelberg 1999 [Wi82] R. Wille: Restructuring lattice theory: an approach based on hierarchies of concepts. In: I. Rival (ed.): Ordered sets. Reidel, Dordrecht-Boston 1982, PROMPT PROMPT [Noy and Musen, 2000] is a tool for semi-automatic guided ontology merging. It is a plugin for Protégé PROMPT leads the user through the ontology-merging process, identifying possible points of integration, and making suggestions regarding what operations should be done next, what conflicts need to be resolved, and how those conflicts can be resolved. PROMPT s ontology-merging process is interactive. A user makes many of the decisions, and PROMPT either performs additional actions automatically based on the user s choices or creates a new set of suggestions and identifies additional conflicts among the input ontologies. The tool takes into account different features in the source ontologies to make suggestions and to look for conflicts. These features include names of classes and slots (e.g., if frames have similar names and the same type, then they are good candidates for merging) class hierarchy (e.g., if the user merges two classes and PROMPT has already thought that their superclasses were similar, it will have more confidence in that suggestion, since these superclasses play the same role to the classes that the user said are the same) slot attachment to classes (e.g., if two slots from different ontologies are attached to a merged class and their names, facets, and facet values are similar, these slots are candidates for merging) facets and facet values (e.g., if a user merges two slots, then their range restrictions are good candidates for merging) In addition to providing suggestions to the user, PROMPT identifies conflicts. Some of the conflicts that PROMPT identifies are: name conflicts (more than one frame with the same name), dangling references (a frame refers to another frame that does not exist), /41

redundancy in the class hierarchy (more than one path from a class to a parent other than root), slot-value restrictions that violate class inheritance.

42 redundancy in the class hierarchy (more than one path from a class to a parent other than root), slot-value restrictions that violate class inheritance. The features in the list above employ the graph structure of the ontologies to a limited extent: we traverse the nodes that are only one or two steps away. AnchorPROMPT [Noy and Musen, 2001] is an extension of PROMPT that compares the graph structure on a larger scale. It takes as input a set of anchors pairs of related terms defined by the user or automatically identified by lexical matching. Anchor-PROMPT treats an ontology as a graph with classes as nodes and slots as links. The algorithm analyzes the paths in the subgraph limited by the anchors and determines which classes frequently appear in similar positions on similar paths. These classes are likely to represent semantically similar concepts. In terms of user support, the PROMPT tool has the following features: Setting the preferred ontology. It often happens, that the source ontologies are not equally important or stable, and that the user would like to resolve all the conflicts in favor of one of the source ontologies. We allow the user to designate one of the ontologies as preferred. When there is a conflict between values, instead of presenting the conflict to the user for resolution, the system resolves the conflict automatically. Maintaining the user s focus. Suppose a user is merging two large ontologies and is currently working in one content area of the ontology. PROMPT maintains the user s focus by rearranging its lists of suggestions and conflicts and presenting first the items that include frames related to the arguments of the latest operations. Providing feedback to the user. For each of its suggestions, PROMPT presents a series of explanations, starting with why it suggested the operation in the first place. If PROMPT later changes the operation placement in the suggestions list, it augments the explanation with the information on why it moved the operation. Figure 3.3. PROMPT screenshot. The main window (in the background) shows a list of current suggestions in the top left pane and the explanation for the selected suggestion at the bottom. The righthand side of the window shows the evolving merged ontology. The internal screen presents the two source ontologies side-by-side (the superscript m marks the classes that have been merged or moved into the evolving merged ontology). We have performed evaluation of both PROMPT and AnchorPROMPT. Our experiments showed that experts followed 88% of PROMPT s suggestions and that PROMPT suggested 75% of all the operations 42/42

43 that the user ultimately performed. We conducted experiments using unrelated source ontologies developed by different research groups. Our experiments show that we can achieve result precision between 61% and 100% depending on the size of the initial anchor set and the maximum length of the path that we traverse. References 1. Noy, N.F. and Musen, M.A. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In: Seventeenth National Conference on Artificial Intelligence (AAAI-2000). Austin, TX, Noy, N.F. and Musen, M.A. Anchor-PROMPT: Using Non-Local Context for Semantic Matching. In: Workshop on Ontologies and Information Sharing at the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001). Seattle, WA, ODEMerge ODEMerge (Ramos, 2001) is a tool to merge ontologies that is integrated in WebODE, the software platform to build ontoligies that has been developed by the Ontology Group at Technical Universitiy of Madrid. Therefore it is a client-server tool that works in the Web. This tool is a partial software support for the methodology for merging ontologies elaborated by de Diego (de Diego, 2001). This methodology proposes the following steps (see figure 3.4): 1) transformation of formats of the ontologies to be merged; 2) evaluation of the ontologies; 3) merging of the ontologies; 4) evaluation of the result; and 5) transformation of the format of the resulting ontology to be adapted to the application where it will be used. Ontology 1 to be merged Transformation of format of ontology 1 Evaluation of ontology 1 Merge of ontology 1 & ontology 2 Evaluation of the result Transformation of format of the result Transformation of format of ontology 2 Ontology 2 to be merged Evaluation of ontology 2 Step carried out by the OntoMerge module Step aided by the whole WebODE Figure 3.4. ODEMerge methodology and its software support The methodology establishes in a very detailed way: what tasks we need to carry out when we have to merge two ontologies, when we have to do these tasks, who have to perform each task, how (s)he have to carried out it, and how the products of each tasks are. For the evaluation and merging of ontologies, very detailed rules are proposed. The methodology is based on the experience merging e-commerce ontologies. WebODE helps in steps (1), (2), (4) and (5) of the merging methodology, and ODEMerge carries out the merge of taxonomies of concepts in step (3). Besides, ODEMerge helps in the merging of attributes and relations, and it incorporates many of the rules identified in the methodology. ODEMerge uses the following inputs (see figure 3.5): 43/43

44 the source ontology 1 to be merged; the source ontology 2 to be merged; the table of synonyms, which contains the synonymy relationships of the terms of ontology 1 with the terms of the ontology 2. the table of hyperonyms, which contains the hyperonymy relationships of the terms of ontology 1 with the terms of the ontology 2. ODEMerge processes the ontologies together with the information of the tables of synonymy and hyperonymy, and it generates a new ontology, which is the merge of the ontology 1 and the ontology 2. That is, the tool compares the ontology 1 with the ontology 2 considering the tables of synonymy and hyperonymy, and it merges these ontologies. New versions of the tool will include electronic dictionaries and other linguistic resources. This tool not only is interesting because of its current functions, but also because it is easily extensible to consider new rules of merging that can be identified. Besides, the tool is though to add it electronic dictionaries or other tools that can substitute the tables of synonyms and hyperonyms. Another important characteristic of ODEMerge is that it can be used to merge ontologies in so many ontology implementation languages as the ones that WebODE processes, since WebODE is the host platform of ODEMerge. The WebODE import module allows importing ontologies written in XML, RDF(S) or CARIN, and allows exporting into XML, RDF(S), OIL, DAML+OIL, CARIN, FLogic, Prolog, Jess, Java and HTML. Figure 3.6 shows a snapshot of ODEMerge. References (de Diego, 2001) de Diego, R. Método de mezcla de catálogos electrónicos. Final Year Project. Facultad de Informática de la Universidad Politécnica de Madrid. Spain (Ramos, 2001) Ramos, J.A. Mezcla automática de ontologías y catálogos electrónicos. Final Year Project. Facultad de Informática de la Universidad Politécnica de Madrid. Spain Ontology 1 Unified ontology Ontology 2 Synonymy Hyperonymy Figure 3.5. ODEMerge inputs and outputs 44/44

45 Figure 3.6. ODEMerge screenshot 3.4 Comparison of tools against the evaluation framework The comments in this section are based on the tools that have been described above (except for Chimaera): FCA-Merge, PROMPT and ODEMerge. Table 3.2 presents the details related to the software architecture and evolution. First, we have included information about the ontology development platform where the tool is integrated. PROMPT and ODEMerge are integrated in Protégé2000 and WebODE, respectively. We have also filled information about the tools' hardware and software requirements, backup management, and tool updates policy. We have not filled any information about failure tolerance, stability and efficiency, because we could not perform a cross evaluation. The main highlights from table 3.2 are: PROMPT and ODEMerge inherit most of their characteristics from Protégé2000 and WebODE, respectively; FCA-merge and PROMPT must be installed locally whereas ODEMerge is installed as a part of the WebODE workbench, which must be accessed from the Web. These tools need use information (electronic dictionaries, lexicons, etc.) during the merge process. The more information a tool uses during this process, the more work it is able to perform without user's participation. From this perspective (table 3.3), FCA-merge is the tool that uses most additional information, as it merges ontologies by means of natural language processing. Interoperability with other ontology tools is also an important aspect (table 3.4) and is usually determined by the ontology development platform in which the merge tool is integrated. Another important aspect is whethet the tool can merge ontologies expressed in different languages. All these tools are able to merge ontologies expressed in different languages (XML, RDFS, OIL, etc.). Merge tools can work interactively or not (table 3.5). FCA-merge and PROMPT ask users during the merging process when there are several alternatives. Nevertheless, ODEMerge performs a completely automatic process. Given that ontologies usually evolve, the management of different ontology versions is also important (table 3.6). None of these tools takes advantage of former versions of the ontologies to be merged, and none of them warns users about changes in the source ontologies. We also present in table 3.7 which kinds of components can be merged by the tool. All the tools allow merging concepts, taxonomies, relations and instances. However, no tool allows merging axioms. Similarly, we have also considered the suggestions provided by the tool (table 3.8). PROMPT presents to users possible alternatives during the merging process. From all of them, PROMPT is the tool that provides most suggestions to the users. The next feature that we have analysed is whether or not the tool allows detecting conflicts in the merge process (table 3.9). For example, name conflicts appear when the same name is used for different concepts in both ontologies. All these tools detect name and structure conflicts. 45/45

46 The table 3.10 shows the methodologies and techniques for ontology merging that each tool supports. All these tools support some kind of method for ontology merging. ODEMerge (with WebODE) is the only one that supports a detailed methodology for ontology merging. Concerning the tools' help system (table 3.11), the documentation on how to use the tool is not usually integrated in it, but available in a separate document or tutorial. Edition and visualization features (table 3.12) are strongly influenced by the ontology development platform in which these tools are integrated. Finally, the table 3.13 shows the experience using the tool. PROMPT and ODEMerge have been used with many different ontologies and in many different domains. 46/46

47 Table 3.1. General description Feature FCA-Merge PROMPT ODEMerge Developers Gerd Stumme, Alexander Maedche Stanford Medical Informatics Ontology Group LIA UPM Current release & Date Prototypical implementation January, (April 2002) Pricing policy Open source Licenses Table 3.2. Architecture and evolution Feature FCA PROMPT ODEMerge Is the ontology-merging tool integrated in a development ontology tool? Which one? Hw/Sw platform No. FCA-Merge is a method rather than a tool Yes, Protégé-2000 Yes, it is integrated in WebODE. Saarbruecken Message Extraction System (SMES), FCA-Merge Core algorithm, any ontology editor (e.g., OntoEdit, SOEP) Java, hence any software platform (Windows, Mac, Linux, Unix, etc.) Does the tool need to be installed locally? Yes Yes No Failure tolerance It is Java based, like WebODE. Backup management? No No Yes, performed by the host tool Stability Efficiency TOOL UPDATES POLICY How often does appear a new version of the tool? How much money does cost the updating? How are the old clients notified about the new versions? Does not apply Does not apply New version approximately 3-4 times a year. Upgrades are free. Information on the Web site New version, approximately twice a year. However, there is not a tool updates policy. 47/47

48 Table 3.3. Information used during the merge process Feature FCA PROMPT ODEMerge Electronic dictionaries, thesauri, etc. Yes, this is necessary for the process, unless there are already common instances. No (but can be integrated) No (although they can be built-in) Lexicons Yes, this is necessary for the process, unless there are already common instances. No (but can be integrated) No (although they can be built-in) Concept definitions and slot values No Yes, this is necessary for the process Yes, but this is not necessary for the process Graph structure No Yes, but this is not necessary for the process Yes Instances of concepts Yes, this is necessary for the process Yes, but this is not necessary for the process No Input from the user Yes, this is necessary for the process Yes, this is necessary for the process Yes, this is necessary for the process Table 3.4. Interoperability Feature FCA PROMPT ODEMerge Is interoperability possible with other ontology tools or other information system? Can the method merge ontologies expressed in different languages? Which ones? No Any, provided that one can identify/create common instances Yes, through import mechanism of the host tool (Protégé-2000) Yes, through import mechanism of the host tool (Protégé-2000). RDFS, XML Schema, OIL Yes, ontologies in different languages (XML, RDF(S), OIL, DAML+OIL, CARIN, FLogic, Prolog, Jess, Java) can be generated (via WebODE). The tool can use the import module of WebODE Table 3.5. Work mode Feature FCA PROMPT ODEMerge In which mode does the method work? Automated with subsequent fine-tuning by the user Interactive + automatic Completely automated Table 3.6. Management of different ontology versions Feature FCA PROMPT ODEMerge Does the method take advantage of the merge of former versions of the ontologies No No No Warnings about changes in source ontologies? No No No 48/48

49 Table 3.7. Components that the method allows merging Feature FCA PROMPT ODEMerge Concepts? Yes Yes Yes Own slots? No Yes Yes Template slots? No Yes Yes Taxonomies Yes Yes Yes Concepts? Yes Yes Yes Relations? Yes Yes Yes Partitions and/or Decompositions No No Yes Relations & Functions? No Relations yes, functions - no Yes Arity No Binary relations Every arity. Can the method integrate the set of AXIOMS of the ontologies to be merged? No No No Can the method integrate the set of RULES of the ontologies to be merged? No No No Instances Are needed for the merging process yes No Of concepts? Yes Yes No Of relations (facts)? No Yes No Claims? No No No Table 3.8. Suggestions provided by the method Feature FCA PROMPT ODEMerge Concepts? Yes Yes No, since the process of merging is not supervised. Own slots? No Yes No, since the process of merging is not supervised. Template slots? No Yes No, since the process of merging is not supervised. Taxonomies Yes Yes No, since the process of merging is not supervised. Concepts? Yes Yes No, since the process of merging is not supervised. Relations? Yes No No, since the process of merging is not supervised. Partitions and/or Decompositions No No No, since the process of merging is not supervised. Relations & Functions? No Relations-yes, functions - no No, since the process of merging is not supervised. Arity No Binary relations No, since the process of merging is not supervised. Sets of axioms? No No No, since the process of merging is not supervised. Sets of rules? No No No, since the process of merging is not supervised. Instances No Yes No, since the process of merging is not supervised. Of concepts? No Yes No, since the process of merging is not supervised. Of relations (facts)? No Yes No, since the process of merging is not supervised. Claims? No No No, since the process of merging is not supervised. 49/49

50 Table 3.9. Conflicts detected by the method Feature FCA PROMPT ODEMerge Name conflicts? Yes Yes Yes, via WebODE Structure conflicts? Yes Yes Yes, via WebODE Table Support of methodologies and techniques Feature FCA PROMPT ODEMerge Does the tool support any methodology? Formal Concept Analysis (FCA) Yes. The PROMPT methodology Yes (de Diego, 01) OTHER TECHIQUES Probability No No No Machine learning Yes, conceptual clustering by FCA No No Table Help system Feature FCA PROMPT ODEMerge Tutorial on methodology? Documentation? Tutorial? See [Stumme, Maedche, IJCAI 2001] Tutorial, documentation Tutorial, documentation (in Spanish) Help on user interface No Tutorial, documentation Yes Context help? No Yes Yes Table Edition & visualization Feature FCA PROMPT ODEMerge View step by step of the process? No Graphical, tabular, hierarchical. Non-graphical Simultaneous view of the ontologies to be merged? Yes, via the concept lattice Yes No Graphical prunes (views) of the ontologies to be merged? No Available through the host tool Available through the host tool Zooms? No Available through the host tool Available through the host tool Hide/show information? No Available through the host tool Available through the host tool Table Experience using the tool Feature FCA PROMPT ODEMerge Merged ontologies and domains Many Many Projects where the merge has been done Many Two projects Applications where the unified ontologies have been used Many Two applications 50/50

51 3.5 Conclusions and main recommendations The main conclusions that can be extracted from the study presented in this chapter are as follows: 1) Two approaches are followed to merge ontologies: a) Starting from the instances of the ontologies. FCA-merge merges ontologies making lattices from their instances. b) Starting from the concepts of the ontologies. PROMPT and ODEMerge make the merging searching for the concepts that have some similarity. 2) All the tools need the participation of the user to obtain the definitive result of the merging process. FCA-merge and PROMPT allow the user to guide the merging. The ontology that results from the ODEMerge process could be also modified by the user. 3) No tool takes advantage of former versions of the resulting ontology. For example, if the ontology O is the merge of O1-v.1 and O2-v.1, and we develop O1-v.2, the tool has to completely repeat the process of merging O1-v.2 and O2-v.1. 4) All the tools allow manipulating ontologies implemented in different ontology implementation languages. This is very important, since ontologies can come from different sources, and they can be used in different applications. 5) No tool allows merging axioms and rules. That is, current tools do not permit the merging of heavyweight ontologies. Summarising, we can say that important points (e.g. the diversity of languages) are being attacked in the study and development of merging tools. However, other important problems (e.g., the merge of axioms) are still a field non very explored. On the other hand, the natural evolution of merging tools should lead to increase the use of knowledge and to decrease the participation of the people in the process. This could improve the possibilities of the merging at run-time. 3.6 URLs Ontology Merge Tools: Chimaera: PROMPT: WebODE: It will be available in 51/51

52 4 Ontology evaluation tools 4.1 Introduction Currently the semantic web [1] attracts researchers from all around the world. Numerous tools and applications of semantic web technologies are already available [2,3,4] and the number is growing fast. Ontologies play an important role for the semantic web as a source of formally defined terms for communication. They aim at capturing domain knowledge in a generic way and provide a commonly agreed understanding of a domain, which may be reused, shared, and operationalized across applications and groups. However, because of the size of ontologies, their complexity, their formal underpinnings and the necessity to come towards a shared understanding within a group of people, ontologies are still far from being a commodity. Developing and deploying large-scale ontology solutions typically involves several separate tasks and requires applying multiple tools. Therefore pragmatic issues such as e.g. interoperability are key requirements if industry is to be encouraged to take up ontology technologies rapidly. The large visibility of the semantic web, it s tools and applications already attracts industrial partners, e.g. in numerous projects funded by the European Commission. In particular, as they move from academic institutions into commercial environments they have to fulfil stronger requirements and in some cases new requirements (e.g. concerning scalability and multi-user access). Different tools from different sources need to interoperate and therefore are typically not anymore standalone solutions but integrated into frameworks. These frameworks must be open to other commercial environments and provide connectors and interfaces to industrial standards. Larger applications need also larger ontologies and therefore require substantially more performance and scalability. A systematic evaluation of ontologies and related technologies might lead to a consistent level of quality and thus acceptance by industry. For the future, this effort might also lead to standardized benchmarks and certifications. In this chapter we propose an evaluation framework for properties of ontologies and technologies for developing and deploying ontologies (cf. Section 4.2). We present example implementations of the framework (cf. Section 4.3), each addressing different aspects of the evaluation criteria proposed in the framework. We compare the tools against our framework (cf. Section 4.4). Before we conclude (cf. Section 4.6) we give a brief discussion of related work (cf. Section 4.5). 4.2 Evaluation framework Our evaluation framework consists of two main aspects: (i) the evaluation of properties of ontologies generated by development tools, (ii) the evaluation of the technology properties, i.e. tools and applications which includes the evaluation of the evaluation tool properties it selves. In an overview these aspects are structured as follows in Table 4.1. Ontology Properties Language conformity (Syntax) Consistency (Semantics) Table 4.1: Evaluation framework criteria Technology Properties Interoperability (e.g. Semantics) Turn around ability Performance Memory allocation Scalability Integration into frameworks Connectors and interfaces 52/52

53 For ontologies generated by development tools the language conformity and consistency may be checked. Language conformity means that the syntax of the representation of the ontology in a special language is conform to a standard. Such a standard is either a well-documented standard defined by a standardization gremium or it is an industrial standard mostly given by a reference implementation. So in the first case the outcome of an ontology tool must be checked with respect to the syntax definition and in the second case it must be tested using the reference implementation. Evaluation of consistency means to what extend the tools ensure that the resulting ontologies are consistent with respect to their semantics, e.g. that different parts of the ontology representation do not contradict. Ontology properties may be evaluated using the ontologies only, i.e. without having tools, e.g. for development, it selves available. In contrast to that for the second block of properties the tools it selves are examined. Interoperability means how easy it is to exchange ontologies between different tools. This includes such aspects as is a tool able to interpret the outcome of another tool in the same way?. This is more than only checking the language conformity because it examines whether different tools interpret the same things in the same way. Often things can be represented in the same language in different ways. Turn around ability means that the outcome of a tool is represented to the user in the same way again later on. E.g. a value restriction may be represented as a range restriction or by a constraint. If the tool shows that as a range restriction it should not show it as a constraint the next time it reads the same ontology. Performance especially concerns the runtime effort of the tools, e.g. how much time is needed for solving a special inference task, for storing ontologies etc. Benchmark tests must be developed to evaluate these performance issues. For these benchmarks reference ontologies, reference ontology classes, reference tasks and reference task classes may be very helpful. Memory allocation means how much memory is needed by the tools to handle ontologies. Similarly to the performance evaluation benchmarks must be available to test memory allocation. For performance evaluation as well as for memory allocation it must be clarified what does the size of an ontology or the complexity of a task mean and which parameters influence this size in what way etc. Scalability evaluates the performance and memory behaviour of the tools with respect to increasing ontologies and tasks. It examines questions like how increases a linear growth of ontologies the amount of memory allocated by the tool?. Integration into frameworks means how easy it is to switch between such tools. For instance it is not very convenient that for a switch between tools it is necessary to store the ontology, to transform it afterwards with a different tool into another language which is a precondition to load it with the other tool. Entirely integrated environments similar to well-known programming environments must be the goal for ontology development tools. Last but not least, the connectivity to other tools is important. This concerns on the one hand connectors for instance to databases, index servers, to systems like MS exchange, Lotus Notes, etc and on the other hand interfaces to the tool itself to use its functionality within other tools. 4.3 Implementations of the framework Overview Table 4.2 shows the evaluated tools in this section. Table 4.2: Overview of evaluation tools Name of the tool OntoAnalyser OntoGenerator Provider Ontoprise GmbH Contact: Jürgen Angele, angele@ontoprise.de Haid-und-Neu-Str Karlsruhe Institute AIFB Contact: York Sure, sure@aifb.uni-karlsruhe.de University of Karlsruhe Postfach Karlsruhe Germany 53/53

54 OntoClean in WebODE Ontology Group (UPM) in Madrid (Spain) Italian National Research Council (CNR*) in Padova (Italy) Contact at CNR: Nicola Guarino, National Research Council LADSEB-CNR (*) Corso Stati Uniti, 4 I Padova. Italy (*) In the process of moving to the new CNR Institute of Cognitive Sciences and Technologies, ISTC-CNR Contact at UPM: Asunción Gómez-Pérez, asun@fi.upm.es Facultad de Informática Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Boadilla del Monte (Madrid). Spain. ONE-T (Ontology Evaluation Tool) Ontology Group (UPM) Contact: Asunción Gómez-Pérez, asun@fi.upm.es Facultad de Informática Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Boadilla del Monte (Madrid). Spain OntoAnalyser Based on the criteria shown in the previous section, we (Institute AIFB and ontoprise GmbH) implemented two tools, i.e. OntoAnalyser and OntoGenerator. Both examples for evaluation tools are realized as plugins for OntoEdit which is a graphically oriented ontology engineering environment [5]. The underlying plugin framework [6] allows for flexible extensions of OntoEdit s core functionalities (also for third parties). Though OntoEdit supports the full lifecycle of ontology development [7], some tasks are only weakly supported by the core functionalities. Specialized plugins enable more fine grained support, e.g. for the evaluation of ontologies through the two plugins here presented. Therefore we tightly integrate the evaluation of ontologies that are (i) created with OntoEdit or (ii) imported from other tools like Protégé [8] and WebODE [9] into the development process. Each of the two plugins addresses different aspects of the evaluation criteria presented in the previous section. OntoAnalyser focuses on evaluation of ontology properties, in particular language conformity and consistency. OntoGenerator focuses on evaluation of ontology based tools, in particular performance and scalability. We now illustrate each plugin by presenting a motivational use case scenario and describing the functionalities. From our own experiences of ontology development and deployment (e.g. the creation of an ontology based corporate history analyser or the development of the ontology based portal of our own institute) we learned that for different purposes ontologies must have different properties. These properties might even be different in different ontology projects or for different target applications. For instance a tool which allows to visualize concept hierarchies might require to allow single inheritance only. The definition of evaluation methods for such properties must be very flexible and easily maintainable. So it is not convenient to program it into a tool. Logic is a very comfortable and powerful way on a very abstract level to express constraints for an ontology or to examine properties of an ontology. For that purpose the rule or constraint language must be able to access the ontology itself, i.e. to make statements about classes, relations, subclasses etc.. F-logic [10] allows to define statements and rules about the ontology (concepts, subconcepts, relations) it selves. E.g. the examination whether in an ontology a concept has at maximum one super concept may be expressed by the following rule: FORALL C check("concept has more than one super concept",c) <- EXISTS S1,S2 54/54

C::S1 AND C::S2 AND NOT equal(s1,s2). In the same way it is easy to define rules to check for instance the disjointness of classes, special conventions for relation names etc. E.g.

55 C::S1 AND C::S2 AND NOT equal(s1,s2). In the same way it is easy to define rules to check for instance the disjointness of classes, special conventions for relation names etc. E.g. the regularities for a project say, that relations must be named with lower letters: FORALL R check("relation name is not in lower case letters",r) <- EXISTS C,C1,R1 C[R=>>C1] AND tolower(r, R1) AND NOT equal(r,r1). OntoAnalyser is a tool which takes such rules, runs the Ontobroker inference engine (cf. [11]) with such rules and provides the user the results of this examination. OntoAnalyser is able to load different rule packages, each intended for a different target tool or target project. Figure 4.1: Example ontology in OntoEdit We illustrate the application of the two rules from above with OntoAnalyser by giving an example. Figure 4.1 shows a screenshot of OntoEdit with an example ontology. It is noteworthy that (i) the concept PhD_student is a subconcept of student as well as of academic_staff, i.e. there exists multiple inheritance, and (ii) that the two concepts student and company both have a relation 7 hasname, but each one with a different range, viz. STRING and BOOLEAN. The two rules presented above were now applied to the example ontology shown in Figure 1. The results of the evaluation by OntoAnalyser are shown in Figure 4.2. Firstly, the multiple inheritance of the concept PhD_student resulted in an error message concept has more than one super concept. Secondly, the existence of the relation hasname with different ranges resulted in an error message there exists a relation with different ranges. OntoAnalyser is realized as a plugin in OntoEdit. Therefore it is well integrated into the ontology engineering environment itself. An ontology may be developed in OntoEdit and during this process it may be on the fly checked using OntoAnalyser. Thus, no export and import and no transformation of the ontology is necessary to perform evaluations. For importing ontologies we would need to check e.g. the interoperability of our tool and the tool that was used for the development therefore an evaluation of another criteria of our framework would be necessary. 7 Sometimes relations with predefined range types like STRING and BOOLEAN are called attributes. 55/55

of ontologies and to improve the quality of ontologies at the same time. 4.3.

56 Figure 4.2: Evaluation results of OntoAnalyser for example ontology For the future we plan to develop standard rule packages for evaluation of various purposes to support efficient and effective engineering of ontologies and to improve the quality of ontologies at the same time OntoGenerator The second implementation OntoGenerator supports performance tests ( stress tests ) of ontology based tools. It creates synthetic ontologies which are not intended to represent a domain of interest, but rather to fulfil certain technical parameters like e.g. a certain number of concepts and instances or certain kinds of rules. OntoGenerator originally was implemented to support the optimisation of the Ontobroker rule evaluation for F-Logic rules (e.g. the ones presented in the previous section). The performance of the rule evaluation strongly depends on the sequence the rule bodies are evaluated in each rule. An example from our optimisation experiments shows the significance of this sequence. E.g., for one example the optimal sequence produced intermediate evaluation results while a bad sequence produced intermediate evaluation results. Our optimiser searches for the best sequence by using a genetic algorithm. To test such optimisation strategies it is necessary to test it for numerous different ontologies and different rule sets with special properties. Therefore we implemented OntoGenerator that is able to produce synthetic ontologies on the fly according to predefined parameters. Figure 4.3: Generating synthetic ontologies with OntoGenerator 56/56

57 Figure 4.3 shows a generated ontology by OntoGenerator. Currently one might predefine the following parameters for generation of ontologies: (i) depth of the concept tree, (ii) width of the concept tree, (iii) total number of relations, (iv) total number of attributes and (v) total number of instances. In detail the parameters have the following effects during the generation: (i) defines how deep the generated tree will be (here: 3 levels deep), (ii) defines the number of children per generated concept (here 3 children), (iii) and (iv) define the total numbers of relations and attributes that are currently attached randomly to concepts and (v) defines the number of instances created, the concepts are chosen randomly for instantiation. For the future we plan to implement statistical measures for the selection of concepts, e.g. used during attachment of relations and attributes and during creation of instances, given the fact that it is important at what level a relation is attached. Due to the inheritance mechanism a relation attached at a higher level will result in a higher amount of inherited relations at the lower levels of the concept hierarchy. Furthermore we will extend OntoGenerator by a rule generator, thus exploring e.g. the following characteristics: - Depth of the rule tree: how long is the chain where one rule depends on the other. - Cyclicity of rules: the dependence of rules may be cyclic. - Length of rule cycles: how long are the dependence cycles. - Complexity of rule bodies: how complex are the object formulas. - Transitivity: how many transitive rules are included OntoClean in WebODE The Ontology Group at the Italian National Research Council (CNR) in Padova (Italy) has elaborated a series of ontology evaluation criteria based in philosophical notions as: rigidity, identity, unity, dependence, etc. Such criteria are used in OntoClean [12, 13, 14], the ontology evaluation methodology proposed by this group to clean tangled ontologies (see deliverable 1.1). On the other hand, the Ontology Group at Laboratorio de Inteligencia Artificial of the Universidad Politécnica de Madrid (UPM) has developed a methodology, called METHONTOLOGY [15] [16] [17] for developing, reengineering and evaluating ontologies (see deliverable 1.1). The UPM group has also built WebODE [9] (see chapter 2 of this deliverable 1.3), a tool that gives support to the main activities (conceptualisation, implementation, reengineering, evaluation) identified in METHONTOLOGY. Since the works performed by both groups are complementary, there is a first version of a joint methodology for integrating the OntoClean methodology into the conceptualisation phase of METHONTOLOGY when the taxonomy is built. Besides, there is a tool named OntoClean implemented as a plug-in of WebODE, which gives support to both, the OntoClean methodology and also to the integrated methodology. The module OntoClean in WebODE provides the following functions: 1) Establishing the evaluation mode. The user can choose whether the system will show the errors every time that a violation of the OntoClean axioms in the ontology is detected, or the system will only show the errors when the user asks for them. 2) Assigning meta-properties to concepts. The user can set up meta-properties concerning to identity, unity, dependency and rigidity. 3) Focusing (or not) on non-rigid properties. The user can decide that the non-rigid properties are shown in a less showy way. This is important because OntoClean is focused in rigid properties. 4) Evaluation according to the taxonomic constraints. The system diagnoses the ontology according to the CNR group principles. The user could relax or to stress the evaluation just clicking on more or less criteria. The main advantage of the OntoClean plug-in is that the criteria used to evaluate the ontology are declaratively expressed in WebODE conceptualisation module. The OntoClean plug-in uses: a) The top level ontology of universals built by the CNR group [18], which has been conceptualised in WebODE. b) The axioms for evaluating ontologies proposed by the CNR group, which has been also written using the AxiomBuilder module of WebODE. 57/57

58 In fact, the evaluation knowledge is represented in a knowledge base in Prolog because both, the top level ontology and the axioms were translated automatically using WebODE translators into ciao prolog. OntoClean uses the WebODE inference engine implemented in Prolog to detect the inconsistencies on the taxonomy, allowing the user to clean the taxonomy according to the taxonomic constraints previously mentioned. Once the ontology is clean, it can be exported to other implementation languages by means of WebODE translators. Figure 4.4 shows a snapshot of the OntoClean module in WebODE. Figure 4.4: Snapshot of OntoClean in WebODE ONE-T ONE-T (Ontology Evaluation Tool) has been developed by the Ontology Group at UPM. ONE-T is a Java web-based application that allows verifying ontologies stored and available in any Ontolingua Server. It has been used in the verification of ontologies in the Ontolingua Server 5.0 and Ontolingua Server 6.0. As it is based on the OKBC protocol, it is easily extensible. Currently, it is being extended to analyze the LOOM system. ONE-T detects the following kinds of inconsistency errors on concept taxonomies, which are deeply described in [19]: - Circularity errors. They occur when a class is defined as a specialization or generalization of itself. Depending on the number of relations involved, circularity errors can be classed as: circularity errors at distance zero (a class with itself), circularity errors at distance 1 and circularity errors at distance n. - Partition errors. Partitions can define concept classifications in a disjoint and/or complete manner. As exhaustive subclass partitions add the completeness constraint to the established subsets, subclass partition errors and exhaustive subclass partition errors have been distinguished: o Subclass partition errors Subclass partition with common instances. It occurs when one or several instances belong to more than one subclass of the defined partition. For example, if dogs and cats form a subclass partition of the set of mammals, an error of this type would occur if we define Pluto as an instance of both classes. The developer should remove the wrong relation to solve this problem. Subclass partition with common classes. It occurs when there is a partition class_p 1,..., class_p n defined in a class class_a and one or more classes class_b 1,..., class_b k are 58/58

59 subclasses of more than one subclass class_p i of the partition. For example, if dogs and cats form a subclass partition of the set of mammals, an error of this type would occur if we define the class Doberman as a subclass of both classes. The developer should remove the wrong relation to solve the problem. o Exhaustive subclass partition errors Exhaustive subclass partition with external instances. These errors occur when having defined an exhaustive subclass partition of the base class (Class_A) into the set of classes class-p 1... class-p n, there are one or more instances of the class_a that do not belong to any class class_p i of the exhaustive partition. For example, if the numbers classed as odd and even had been defined as forming an exhaustive subclass partition and the number four were defined as an instance of the class numbers (instead of the class even), we would have an error of this type. Exhaustive subclass partition with common instances. It occurs when one or several instances belong to more than one subclass of the defined exhaustive partition. For example, having defined the classes odd and even as an exhaustive subclass partition of the class number, an error of this type appears if the number four is an instance of the odd and even numbers. Exhaustive subclass partition with common instances. It occurs when there is a partition class_p 1,..., class_p n defined in a class class_a and one or more classes class_b 1,..., class_b k are subclasses of more than one subclass class_p i of the partition. For example, having defined the classes odd and even as an exhaustive subclass partition of the class number, an error of this type appears if the class prime is a subclass of the odd and even numbers. So, if we define the number Three as a prime number, we get the inconsistency since three would be an instance of odd and even numbers. - Redundancy o Redundancies of subclass-of relations. It occurs between classes when subclass-of relations are repeated. We can distinguish direct and indirect repetition. It exists a Direct repetition, when two or more subclass of relations between the same source and target classes are defined, that is, including the subclass of relation between the classes dog and mammals twice. It exists a Indirect repetition, for example, if we define the class dog as a subclass of pet, and pet as a subclass of animal, when dog is also defined as a subclass of animal. o Redundancies of instance-of relations. It exists a direct repetition if two instance-of relations between the same instance and class are defined. It exists an indirect repetition, for example, if we define the instance Clyde as an instance of real elephant and real elephant as a subclass of the class elephant. The definition of an instance of relation between Clyde and elephant would lead to a redundancy in the taxonomy. - Grammar errors o In subclass-of relations: the subclass-of relation is not created between classes. o In instance-of relations: the destination of the instance-of relation is not a class. o In partitions: there is an instance as part of the partition. - Identical formal definition of some classes. It occurs when there are two or more classes in the ontology with the same formal definition, that is, the only difference between the subclasses is the name. The tests performed with this tool have consisted in the creation of wrong concept taxonomies and instances, with the errors described above, so as to analyze if it was possible to define them in Ontolingua and also if after that the inconsistent information was already there. The tool has been also used to analyze existing ontologies in the Ontolingua Servers' libraries. Figure 4.5 shows a screenshot of ONE-T. 59/59

Figure 4.5. ONE-T showing the description of the ontology taxonomy before checking for errors. 4.4 Comparison of the implementations against the evaluation framework All described tools cover different aspects in our evaluation framework.

60 Figure 4.5. ONE-T showing the description of the ontology taxonomy before checking for errors. 4.4 Comparison of the implementations against the evaluation framework All described tools cover different aspects in our evaluation framework. In the Table 4.3 we evaluate the reference implementations according to the descriptions in the previous section against our evaluation framework. It is shown that each tool targets different aspects of the framework. Table 4.3: Evaluation of the implementations against the evaluation framework Feature OntoAnalyser OntoGenerator ONE-T OntoClean in WebODE Ontology properties Language conformity (Syntax) X X X Consistency (Semantics) X X X Technology properties Interoperability (e.g. Semantics) Turn around ability Performance Memory allocation Scalability Integration into frameworks Connectors and interfaces X X X 4.5 Related Work Though there exists seminal research for evaluation of ontologies (e.g. [19], [20], [21]), a recent position statement clarifies the current state of the art [22]: Evaluation of ontologies is rarely discussed in the 60/60

61 literature. There is a dearth of detailed reports on evaluation and lack of tools to assist ontology testers. Evaluation during development could be assisted by guidelines, integrated environments with support for certain types or errors (type-checking, referential integrity, etc.). Given the sheer size and ever-increasing number of ontologies there is an urgent need for evaluation upon deployment in applications. This, preferably, has to be executable. 4.6 Conclusions The growing interest in the semantic web attracts beside academia more and more industrial partners. Ontologies and related technologies like development tools as a core component for the semantic web need to fulfil strong requirements to meet the demands of industrial partners. We presented a framework for evaluation of ontologies and related technologies that aims at ensuring necessary levels of quality. Our framework consists of two main areas. We distinguish between (i) ontology properties, i.e. language conformity and consistency, and (ii) technology properties, i.e. interoperability, turn around ability, performance, memory allocation, scalability, integration into frameworks and, last but not least, connectors and interfaces. We presented reference implementations of the framework, OntoAnalyser and OntoGenerator. Each addresses different aspects of the evaluation criteria, which was shown by evaluating the implementations against the framework. OntoAnalyser, ONE-T and OntoClean focuses on evaluation of ontology properties, in particular language conformity and consistency. OntoGenerator focuses on evaluation of ontology based tools, in particular performance and scalability. For the future we envision a growing interest in evaluation of ontologies and in particular in evaluation of technologies related to ontologies. Real life implementations of new technologies require benchmarks and standardization efforts, e.g. like in the database community. Therefore we encourage the semantic web community to enforce their research efforts by developing further standard criteria, e.g. benchmarks and certifications, and tools that implement these criteria to evaluate ontologies and related technologies. Currently we see two encouraging starting points: (i) the EU IST OntoWeb thematic network initiative that creates a roadmap for the future of the ontology and semantic web community and (ii) a workshop at the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002): EON2002: Evaluation of Ontology-based Tools. Acknowledgements: Research for this section was partially funded by EU in the project IST On-To-Knowledge. 4.7 References [1] T. Berners-Lee, J. Hendler and O. Lassila. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 2002, cf. [2] EU IST project On-To-Knowledge: Content-driven knowledge management tools through evolving ontologies, cf. [3] US DARPA project DARPA Agent Markup Language (DAML), cf. [4] EU IST thematic network OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce, cf. [5] Y. Sure, M. Erdmann, J. Angele, S. Staab, R. Studer and D. Wenke. OntoEdit: Collaborative Ontology Engineering for the Semantic Web. In Proceedings of the International Semantic Web Conference 2002 (ISWC 2002), June , Sardinia, Italia. [6] S. Handschuh. Ontoplugins a flexible component framework. Technical report, University of Karlsruhe, May [7] S. Staab, H.-P. Schnurr, R. Studer, and Y. Sure. Knowledge processes and ontologies. IEEE Intelligent Systems, Special Issue on Knowledge Management, 16(1), January/Febrary [8] N. Fridman Noy, R. Fergerson, and M. Musen. The knowledge model of Protégé2000: Combining interoperability and flexibility. In Proceedings of EKAW 2000, NCS 1937, pages Springer, [9] J.C. Arpírez, O. Corcho, M. Fernandez-Lopez, and A. Gomez-Perez. WebODE: a scalable workbench for ontological engineering. In Proceedings of the First International Conference on Knowledge Capture (K-CAP) Oct , 2001, Victoria, B.C., Canada, /61

62 [10] M. Kifer, G. Lausen, J. Wu. Logical Foundations of Object-Oriented and Frame-Based Languages, Journal of the ACM, 42, [11] S. Decker, M. Erdmann, D. Fensel, and R. Studer. Ontobroker: Ontology based access to distributed and semistructured information. In R. Meersman et al., editor, Database Semantics: Semantic Issues in Multimedia Systems. Kluwer Academic, [12] Oltramari, A., Gangemi A., Guarino N., Masolo C Restructuring WordNet's Top-Level: The OntoClean approach. To appear in Proceedings of LREC2002 (OntoLex workshop). Las Palmas, Spain. [13] Guarino, N. and Welty, C Evaluating Ontological Decisions with OntoClean. Communications of the ACM, 45(2): [14] Welty, C., Guarino, N Supporting Ontological Analysis of Taxonomic Relationships. To appear in Data and Knowledge Engineering,September [15] Fernández-López, M.; Gómez-Pérez, A.; Pazos-Sierra, A.; Pazos-Sierra, J. Building a Chemical Ontology Using Methontology and the Ontology Design Environment. IEEE Intelligent Systems & their applications. January/February PP [16] Fernández, M.; Gómez-Pérez, A.; Juristo, N. METHONTOLOGY: From Ontological Art Towards Ontological Engineering. Symposium on Ontological Engineering of AAAI. Stanford (California). March [17] Gómez Pérez, A. Knowledge Sharing and Reuse. In J. Liebowitz (Editor) Handbook of Expert Systems. CRC [18] Guarino, N.; Welty, C A Formal Ontology of Properties. In R. Dieng and O. Corby (eds.), Knowledge Engineering and Knowledge Management: Methods, Models and Tools. 12th International Conference, EKAW2000. Springer Verlag: [19] Gómez-Pérez, A. Evaluation of Ontologies. International Journal of Intelligent Systems. 16(3). March, [20] A. Gómez-Pérez. A framework to verify knowledge sharing technology. Expert Systems with Application, 11(4): , [21] Grüninger, M., Fox, M.S. The Role of Competency Questions in Enterprise Engineering. In IFIP WG 5.7, Workshop Benchmarking. Theory and Practice, Trondheim/Norway, [22] Y. Kalfoglou. Evaluating ontologies during deployment in applications. Position statement at the OntoWeb 2 meeting , Amsterdam, The Netherlands, cf. [4]. 62/62

63 5 Ontology based annotation tools 5.1 Introduction We can define an ontology related Annotation Tool as a tool which makes use of a pre-existing ontology to insert mark-up into a Web page or other document and/or is used to populate knowledge bases via the mediation of a marked up document. There are thus two main philosophies behind ontology annotation tools. Firstly, what we can call the semantic web approach sees the production of annotated pages as primary with the population of ontologies as secondary. Secondly, what we can call the knowledge engineering approach sees ontologyguided annotation of documents as a means of populating knowledge bases as well as producing annotated documents. In essence both philosophies produce similar tools. However the difference in emphasis is important since it is likely that depending on the goal of the user one or other class of tool will be more appropriate. In the next section we present our framework for describing and evaluating annotation tools. We will then introduce the tools using the framework. We make no attempt to rank the tools since in the current state of the art they are all fairly similar. Furthermore such differences as there are (for instance in the ontology language) will determine their relative usefulness. For this reason we have not conducted a more extensive analysis using canonical ontologies, knowledge bases and documents. 5.2 Framework used to describe and compare the tools Screen snapshot General description of the tools In this section we will outline the main functionalities of the tool and include information about developers, the status of the software, relevant URLs, information about the necessary hardware and software platforms, its architecture etc. URL References Documentation Here we will indicate what papers, manuals and so on are available. Tutorial Material Here we will indicate if there is any material available for users to familiarize themselves with the tool. This could include tutorial sections in manuals, web pages or help incorporated into the program. Available modes of working Here we will indicate whether it is possible to use the tool to cooperate or collaborate with others. Automation Here will indicate whether and to what extent the tool can automatically annotate texts. Interoperability with other ontology development tools Here will indicate whether the results achieved by using the tool can be used by other tools. We will also indicate whether the program is part of a suite of ontology-building tools. Ontology related points Here we will indicate: (a) Where ontologies can be retrieved from. This might include WWW repositories as well as the local machine. (b) Where populated ontologies can be written to. (c) The underlying language(s) for the ontology definition and for the ontology notations (d) Whether browsing of concept/properties/relations is possible. (e) Whether the tool provides restricted values. (f) Whether the tool checks constraints. Kind of documents that can be annotated Here we will indicate whether the program can cope only with HTML pages. 63/63

Usability aspects Under this heading we will include the following aspects of usability: (a) How easy is it to learn the system? (b) How easy it in everyday use? (c) How efficient is the tool?

64 Usability aspects Under this heading we will include the following aspects of usability: (a) How easy is it to learn the system? (b) How easy it in everyday use? (c) How efficient is the tool? Other comments 5.3 Ontology annotation tools In this section, we will describe and evaluate the tools in terms of the above framework. We will then summarize our main findings in a table AeroDAML General description of the tool Figure 5.1. AeroDAML AeroDAML is being developed as part of the UML Based Ontology Toolset (UBOT) Project. According to Kogut and Holmes: AeroDAML is a knowledge markup tool that applies natural language information extraction techniques to automatically generate DAML annotations from web pages. The tool can be accessed via a Web page ( and can be used to annotate any Web page using a single predefined ontology for common concepts and relationships (based on WordNet) without any further user input. A client/server version was not reviewed. See Figure 5.1 for the interface. URL: References: Kogut, P. and Holmes, W. AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages. First International Conference on Knowledge Capture (K-CAP 2001) Workshop on Knowledge Markup and Semantic Annotation, Victoria, B.C. October 21, Documentation As above. Tutorial Material Unknown. Available modes of working Single user. 64/64

65 Automation The Web page version is entirely automatic. Interoperability with other ontology development tools Unknown. Ontology related points In the web version of the tool a predefined ontology is used. The annotation is in DAML and is returned as a web page. A client/server version supports customized ontologies with output to a file. Kind of documents that can be annotated Static HTML documents. Usability aspects The Web version is simple to use requiring only the entry of a URL and the choice of how the DAML should be presented. Other comments The Web version is extremely simple to use but with limited applicability COHSE General description of the tool Figure 5.2. COHSE COHSE (Conceptual Open Hypermedia Services Environment) is being developed by the Information Management Group (University of Manchester) and the Intelligence, Agents, Multimedia Group (University of Southampton). The aim of this set of tools is the use of metadata to support the construction and navigation of links in the Semantic Web. It relies on three kinds of services: ontology reasoning services, a Web-based open hypermedia link service (that offers link providing facilities) and the integration of both services to form a conceptual hypermedia system. This allows documents to be linked via metadata describing their contents. See Figure 5.2 for the interface. URL: References: Towards Annotation Using DAML+OIL. Bechhofer, S., Goble, C. KCAP'01 Workshop on Semantic Markup and Annotation. Victoria, Canada. October, Conceptual Open Hypermedia = The Semantic Web?. Bechhofer, S.; Carr, L.; Goble, C.; Hall, W. Technical Report. Available at 65/65

Documentation. The references and URL already provided. There is no user manual for these tools. Tutorial. No tutorial is available. Available modes of working. Individual or collaborative.

66 Documentation. The references and URL already provided. There is no user manual for these tools. Tutorial. No tutorial is available. Available modes of working. Individual or collaborative. The use of an annotation server permits annotations from different users. Annotation servers and ontology servers can be installed either locally or accessed from a remote URL. Automation. Annotation with COHSE is manual and automatic. The tool first extracts annotations automatically, and later allows the user to add their own annotations. Interoperability with other ontology tools. COHSE annotation tools are integrated in the Mozilla web browser. It can also interoperate with annotation servers such as Annotea. Ontology related points Ontologies are accessed through the ontology service, which can be on a local machine or accessed from a remote URL. Annotated pages can be stored on the Annotation Server. The underlying languages are OIL and DAML+OIL. The concept taxonomy of the ontology can be browsed when creating the annotations. The tool does not provide restricted values for filling attributes or relations. In fact, it does not provide any means to establish relationships among concepts or to determine the values of their attributes. There is no constraint checking. Kind of web pages that can be annotated. Static HTML. Usability aspects The system is easy to learn. If the user does not know how to create complex concept expressions using the concept browser, he/she will only be able to create annotations for simple concepts. The system is easy to use, although it is not very easy to install all the components. Annotation of pages is slow if too many annotations must be created. Large concept taxonomies are loaded very fast. If a proxy is used to download the Web pages with annotations, it is quite slow MnM General description of the tool Figure 5.3. MnM 66/66

67 MnM is a tool being developed for the AKT Interdisciplinary Research Collaboration (UK EPSRC, GR/N15764/01). Its purpose is the annotation of documents using markup derived from pre-existing ontologies. The tool works by (a) annotating a training set of text and/or HTML documents and (b) using this to generate lexical rules which can be used to automatically extract information from another set of documents. The instances derived from this process can be used to populate the ontology used in the annotation. It supports both manual and (semi-)automated annotation, thanks to integration with information extraction technology. An initial prototype has been developed as a Java application and uses the Amilcare information extraction module developed at the University of Sheffield (Ciravegna, 2001). However the IE API is generic, so alternative IE tools can be plugged in. A more complete version of the tool will be available by July See Figure 5.3 for the interface. URL: References: Motta, E., Vargas-Vera, M., Domingue, J., Lanzoni, M. and Ciravegna, F. (2002) MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Mark-up. Semantic Authoring, Annotation & Knowledge Markup Workshop, ECAI 2002, July Lyon France. Maria Vargas-Vera, John Domingue, Yannis Kalfoglou, Enrico Motta and Simon Buckingham Shum. Template-Driven Information Extraction for Populating Ontologies. In proceedings of the Workshop Ontology Learning IJCAI-2001, Seattle, USA, 4 August URL: (CEUR Workshop proceedings Vol-38). Maria Vargas-Vera, John Domingue, Enrico Motta, Simon Buckingham Shum and Mattia Lanzoni. Knowledge Extraction by using an Ontology-based Annotation Tool. In proceedings of the Workshop Knowledge Markup & Semantic Annotation, K-CAP 2001, Victoria Canada, October Ciravegna F. Adaptive Information Extraction from Text by Rule Induction and Generalisation. Proc of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle, August Documentation To be produced soon. Tutorial Material There is no tutorial material available. Available modes of working This tool is essentially for single users. However it can access (and populate) collaboratively built ontologies located on a WebOnto ontology server. Automation The tool is designed to automatically populate ontologies after an initial training phase (with an optimum of 40 documents). Possible new instance definitions derived from pages annotated automatically are presented to the user for acceptance and/or correction. Interoperability with other ontology development tools The tool is designed initially to interact with the WebOnto tool. Ontology related points Ontologies can be retrieved from a WebOnto OCML repository either operating on a local machine or across the web. In future versions access will be possible to use any ontology server supporting the Open Knowledge Base Connectivity (OKBC) API and ontologies defined in RDF and DAML+OIL. Populated ontologies can be written to the WebOnto repository. The underlying language for ontology definition is OCML though accepted instances can be saved as OCML, RDF and XML. The XML annotation tags are generated from the class/instance names. Documents containing the annotations can be saved locally as HTML, XML and text. Hierarchies of concepts (or classes) and their properties can be browsed. Operations with relations are currently unsupported but it is planned to deliver relationdriven annotation in the future. The tool does not provide a prior list of possible values for slots nor does it check for any constraints on property values during annotation. Values are checked when an instances is accepted for transmission to the ontology server. Kind of documents that can be annotated These can include text, XML or static HTML documents. The tool incorporates a web browser. Usability aspects The system is easy to learn to use once an understanding of the basic training process is gained. The system is easy in everyday use with a satisfactory interface. The system allows efficient annotation of documents. 67/67

Other comments While it is possible to add other learning/information extraction algorithms the system is currently limited to Amilcare. An earlier version made use of the Umass plugin.

68 Other comments While it is possible to add other learning/information extraction algorithms the system is currently limited to Amilcare. An earlier version made use of the Umass plugin. The API will be generic in the final system. Better editing facilities will be developed in the near future. It is also planned to include an additional component based on Programming By Example technology which can learn new annotations, store these in a library and use them in critiquing user annotations OntoAnnotate General description of the tool Figure 5.4. OntoAnnotate OntoAnnotate is a tool being developed as the commercial implementation for the CREAM annotation environment framework. According to the OntoAnnotate product flyer: OntoAnnotate is a semiautomatic annotation tool that enables you to collect knowledge from documents and web pages, to create a semantic document base and to enrich your intranet with metadata. See Figure 5.4 for the interface. The tool is currently undergoing major changes. URL: References: Handschuh, S., Staab, S. and Maedche, A. (2001) CREAM Creating Relational Metadata with a Component-Based, Ontology-Driven Framework. K-CAP 2001, First International Conference on Knowledge Capture, Oct. 21-3, 2001, Victoria, B.C., Canada URL: Handschuh, S. and Staab, S. (2002) Authoring and Annotation of Web Pages in CREAM WWW Documentation Product flyer Tutorial Material Unknown 68/68

69 Available modes of working Collaborative working is supported through remote connection to OntoBroker. Automation The tool is semi-automatic to the extent that the annotator is assisted by different views on an ontology and facts, with annotation working by simple drag and drop actions. In future versions relevant text fragments in the browsed documents will be highlighted and several actions will be offered. Interoperability with other ontology development tools OntoAnnotate uses OntoBroker as underlying data source and inference engine. The user can connect to any running instance of OntoBroker by choosing its host and port. Currently Ontobroker reads F-Logic, a Datalog/Prolog like syntax and RDF. Within OntoAnnotate you can start a local instance of OntoBroker with one or more files (currently only in FLogic-format). Ontology related points Annotated pages are stored locally for transmission to the web. When a URL is revisited the local document containing the annotations is shown. In future versions the annotations will be stored in OntoBroker, serving as annotation server. The markup language used for annotations is based on HTML-A. Inline annotations are used. In future RDF/DAML will be supported. Ontologies are loaded from OntoBroker. The underlying languages for ontology definition are Flogic and RDF/RDF Schema. Instances/facts are saved as Flogic. Hierarchies of concepts (or classes) their properties and relations can be browsed. As with OntoMat-Annotizer, annotation works by copying text from a document, pasting a new instance and then adding attribute values in the same way. The tool provides a prior list of possible values for relational slots. Only these values can be used in these slots thereby enforcing constraints on possible values. Kind of documents that can be annotated Static HTML documents. Word and Excel files will follow. Usability aspects The system is easy to learn to use although this was made easier because of prior experience with OntoMat-Annotizer and its help assistant. The system is easy in everyday use with a satisfactory interface. The system allows efficient annotation of documents. Other comments In future versions OntoAnnotate will be integrated into MS Internet Explorer OntoMat-Annotizer Figure 5.5. OntoMat-Annotizer 69/69

70 General description of the tool OntoMat-Annotizer is a tool being developed as the reference implementation for the CREAM annotation environment framework, the commercial implementation of which is OntoAnnotate (see above). Its primary purpose is the production of marked-up web pages which can be reasoned about using, for example, semantic web agents. The tool is written as a Java application. The tool has a modular architecture with plug-ins for the ontology browser, web browser, web crawlers, inferencing system, tutorial/assistant and connection to an ontology server. The tool is available as a free download. See Figure 5.5 for the interface. URL: References: Handschuh, S., Staab, S. and Maedche, A. (2001) CREAM Creating Relational Metadata with a Component-Based, Ontology-Driven Framework. K-CAP 2001, First International Conference on Knowledge Capture, Oct. 21-3, 2001, Victoria, B.C., Canada URL: Handschuh, S. and Staab, S. (2002) Authoring and Annotation of Web Pages in CREAM. WWW Documentation See web site. Tutorial Material The currently available version incorporates a helpful assistant which outlines the annotation process. Available modes of working This tool is essentially for single users. However, collaboration can take place via OntoBroker as an shared server for annotation. A plug-in has also been written which enables the sharing of metadata between several OntoMat clients. Automation While the CREAM framework allows for the automated annotation of texts this is so far unimplemented. Interoperability with other ontology development tools The tool uses the same data model as OntoEdit. A plug-in which allows it to work with OntoBroker is available but is not included with the downloadable version. Ontology related points Ontologies can be retrieved from local files and via URLs. Metadata can be stored in the document which is annotated, a local file (for transmission to the web) or at an annotation server. The underlying language for ontology definition and markup is DAML+OIL. Hierarchies of concepts (or classes) and their properties can be browsed. Annotation works by copying text from a document, pasting a new instance and then adding attribute values in the same way. The tool provides a prior list of possible values for relational slots. Only these values can be used in these slots thereby enforcing constraints on possible values. Kind of documents that can be annotated Static HTML documents. The tool incorporates a web browser. Since annotations are made in a copy of a web page the tool can operate on dynamic documents as well. Usability aspects The system is easy to learn to use, helped by the OntoMat-Annotizer Assistant. The system is easy in everyday use with a satisfactory interface. The system allows efficient annotation of documents. Other comments After a few problems getting OntoMat-Annotizer to work (largely because of the wrong Java version) the most recent version (April 2002) is both intuitive and easy to use. Since OntoMat-Annotizer shares its code with OntoEdit it can also support F-Logic and RDF(S). 70/70

71 5.3.6 SHOE Knowledge Annotator General description of the tool Figure 5.6. SHOE Knowledge Annotator screenshot The SHOE Knowledge Annotator has been developed by the Parallel Understanding Systems Group, Department of Computer Science, University of Maryland at College Park. It is available as an applet or a standalone Java application that allows users to mark-up web pages with SHOE instances and claims without worrying about the HTML-like codes of SHOE ontologies. It allows the uploading of several SHOE ontologies and the annotation of a document with respect to those ontologies. See Figure 5.6 for the interface. URL: References: A Portrait of the Semantic Web in Action, by Jeff Heflin and James Hendler. IEEE Intelligent Systems, 16(2), Ontology-based Web Agents, by Sean Luke, Lee Spector, David Rager, and James Hendler. In Proceedings of First International Conference on Autonomous Agents 1997, AA-97. Documentation. The references and URL already provided. There is no user manual. Tutorial. No tutorial is available. Available modes of working. Individual. The application can be installed locally or accessed at a URL. Automation. Annotation with this tool is manual. Interoperability with other ontology tools. SHOE annotations made with the SHOE Knowledge Annotator can be used with the following tools: Exposé, SHOE Search, PIQ (PARKA Interface for Queries) and Semantic Search. Ontology related points Ontologies can be sourced from a local machine or via an URL. Annotated documents can be stored on the local machine and can be later uploaded manually to the corresponding URL. The underlying language is SHOE. Concepts in the ontology are available as a list when creating instances. However concept taxonomies are shown when creating claims. The tool does not provide restricted values for filling attributes or relations (for example, instances of the concepts which are arguments of the relations). There is no constraint checking on the values of attributes and claims. The only validation consists of checking that the instances are already defined and that the concepts are defined in the ontology. Kind of web pages that can be annotated. Static HTML. 71/71

Evaluation of RDF(S) and DAML+OIL Import/Export Services within Ontology Platforms

Evaluation of RDF(S) and DAML+OIL Import/Export Services within Ontology Platforms Asunción Gómez-Pérez and M. Carmen Suárez-Figueroa Laboratorio de Inteligencia Artificial Facultad de Informática Universidad