Semantics and Ontologies for Geospatial Information. Dr Kristin Stock

Semantics and Ontologies for Geospatial Information Dr Kristin Stock

Introduction The study of semantics addresses the issue of what data means, including: 1. The meaning and nature of basic geospatial constructs. 2. How the meaning of geospatial data can be determined. 3. How the meaning of geospatial data can be represented. 4. How knowledge of the meaning of geospatial data can be used to assist in geospatial data handling.

Introduction (cont d) Knowledge of semantics can assist with geospatial data handling in the following ways: allowing users unfamilar with data to interpret and use it correctly; automatically determining semantic similarity (which can be used in integration, translation, etc.); querying (intelligent querying, NLP) and Web Services.

Outline PhD Research: The Inclusion Rules Method - uses first order logic to represent semantics (areas 3 and 4). Ontologies: evaluation of ontology languages for the handling of geographic information (area 3).

PhD Research Problem How to automatically integrate and translate semantically heterogeneous spatial data sets. Needed to be able to identify semantically equivalent elements from different data sets.

Methods for Determining Semantics Methods based on existing information: Schema characteristics and Usage and access patterns. Methods using additional representations: Semantic networks and hierarchies (e.g. synonyms); Frames (from KR) - complex descriptions linked to create taxonomies; Logic; Algebraic specifications and Ontologies (combine semantic networks, frames and/or logic).

PhD Research Problem (cont d) Developed a method to represent semantics based on cognitive science and linguistics. Used that representation to determine similarity and automatically integrate or translate.

Semantics and Cognitive Models A database is an abstraction of reality created by an individual using his or her cognitive model (world view). Cognitive models differ depending on a number of characteristics, including education, experiences, theoretical assumptions, language. Spatial data is used by a wide range of different people different cognitive models databases with different semantics.

The Inclusion Rules Method Based on cognitive science theories of the models that individuals use to represent the world, also supported by linguistics theories. There are many different theories about the form of the cognitive model. Used the theory that sees the cognitive model as a set of categories or concepts.

The Inclusion Rules Method (cont d) Specifically applies Klausmeier et al theory of concept attainment: when concepts are attained at their highest level, they are thought of by the individual as inclusion and exclusion rules that determine whether a real world entity is an example of a concept. This theory has a sound experimental basis.

What is an Inclusion Rule? A dimension (the variable). A property of that dimension (the value of the variable). For example: Dimension = colour and Property = red.

Dimensions A standard set of dimensions are used: 54 general dimensions defined by Dahlgren (based on semantic theory) 69 spatial relationship dimensions defined by Mark and Egenhofer s 9 Intersection 3 database dimensions used to represent schema structure.

Property Values A property can be any of the following: a single value; a range of values; an enumerated set of values or another predicate.

Inclusion Rule Predicates A concept is represented by a particular combination of inclusion rules. Predicates are formed using inclusion rules and the and and or operators.

Example (town) Predicate = R1 AND R8 AND R9 AND ( R101 OR R102 ) R1 = material [EARTH] R8 = requirement [DEFINITION OF POSITION AND EXTENTS] R9 = function [ADMINISTRATION] R101 = RR447 [R1 AND R8 AND R223 AND ( R183 OR R184 ) AND R226] R102 = PR47 [R1 AND R8 AND R223 AND ( R183 OR R184 ) AND R226]

Determining Similarity Compare the predicates of two concepts. Get the intersection set (PI) If all of the rules of either predicate are in PI, one is a subset of the other; If PI is empty, the predicates are disjoint; If PI contains all the rules from both predicates, they are equivalent and If PI is not empty and both predicates contain rules that are not in PI, they are overlapping.

Integration, Translation and Data Population Once similarity relationships have been determined, they can be used to: integrate data sets (based on ratios of similarity to difference, and required thresholds for these) translate between data sets populating the integrated or translated data set using the similarity relationships. Also provides a semantic definition for the resulting integrated or translated database.

Advantages of this Method Users are confined to representation in a particular form - less ambiguity. Properties may be predicates - less reliance on assumption of common terminology. Operational definitions - less ambiguity. Provides for integration, translation and data population.

Disadvantages of this Method Requires time-consuming definition of Inclusion Rules. It can still be difficult to fully represent semantics using the restricted format.

What are Ontologies? An ontology is a formal specification of a shared conceptualisation. Based on the idea of a group or information community with a common world view. Formal specification = machine readable. Can be: lightweight, including concepts, taxonomies, relationships between concepts and properties or heavyweight, add axioms and constraints to clarify the meaning of concepts.

Ontology Languages and KR Most modern ontology languages are built on KR formalisms, which is useful because: they have precise and clearly defined semantics they often provide for both knowledge representation and inference about knowledge they often have a sound theoretical basis.

Types of Ontology Languages Logic. Frames. Networks. Procedural, directing how reasoning occurs. Rules (if..then). Description Logics (frame, network + logic). Use subsumption hierarchies Including limited reasoning on those hierarchies.

Requirements for Spatial Knowledge Representation Used Smith and Mark s list of characteristics of spatial data to define a set of requirements for an ontology language: general spatial characteristics and geometric, mereological and topological characteristics.

Smith and Mark (1998) Semantic Characteristic Geographic objects are intrinsically tied to space, the object and its location being intimately intertwined. The way a geographic object is represented is often size or scale dependent. Requirements for an Ontology Language. The language must allow a concept to be defined as being dependent on the presence of a spatial representation. This may be done by providing rich tools for the expression of essential properties of a concept, including the ability to represent general axioms that apply across the entire ontology (A2) and either of the following: 1. Tools to specify necessary and sufficient conditions, including: Provision for specification of necessary conditions (either individual conditions or groups of conditions) of a concept. This defines conditions that an object must meet in order to be a member of the concept. In this case, the necessary condition would be that the object must be related to space, and if this condition is not met, then it is not a member of the concept (C2). Provision for specification of sufficient conditions (either individual conditions or groups of conditions) of a concept. This defines conditions that a sufficient to indicate that an object is a member of the concept. In this case, the sufficient condition would be that the object is related to space, such that if the object is related to space, this is sufficient to identify it as a member of the concept (C3). Provision for specification of necessary and sufficient conditions (either individual conditions or groups of conditions) of a concept. This defines conditions that a both sufficient and necessary to indicate that an object is a member of the concept. That is, the object is a member of the concept if and only if the condition is met, and if the condition is met, it is a member of the concept (C4) (Swartz, 1997). 2. Specification of rigid, non-rigid, anti-rigid, identity and dependency characteristics of a property or concept (Gomez-Perez et al, 2004, p.73) (C5). In order for the ontology language to cater for this characteristic, the language must: provide strong taxonomic capabilities for representation of IS-A hierarchies with specification of additional conditions for the more specific levels (T1); provide support for functions, so that different semantics can be defined on the basis of specific conditions (F1); support relation taxonomies, so that a distinct taxonomy may be built for this specific type of IS-A relationship that is distinct from the IS-A relationship as it is more usually understood (T2) and provide support for production rules (P1).

Smith and Mark (1998) Semantic Characteristic Geographic objects are typically complex, involving many parts. Geographic objects may have interiors and exteriors. Geographic objects may have various topological (connectedness) relationships with other geographic objects. The topological and mereological relationships of geographic objects are related to each other. Requirements for an Ontology Language (multiple listings imply alternatives rather than concurrent requirements). In order for the ontology language to cater for this characteristic, the language must: be able to represent PART-OF taxonomies (T4). allow definition of disjoint, exhaustive or partition relationships in a hierarchy (T5); In order for the ontology language to cater for this characteristic, the language must: provide support for relations (R1); allow properties to be attached to relations i so that the mutual dependence of interiors and exteriors can be represented (R2) and provide tools to specify axioms that relate to a concept so that the presence of a hole in a geographic object can be appropriately indicated (A1). In order for the ontology language to cater for this characteristic, the language must: provide support for binary (R1) and n-ary relations (R3); provide support for relation taxonomies (T2); allow properties of a relation to be specified i (R2); allow axioms about a relation to be specified (A3) and provide tools for reasoning about relations (for example, of the kind described in Santos and Amaral, 2000) (Inf2). In order for the ontology language to cater for this characteristic, the language must: provide support for binary (R1) and n-ary relations (R3); provide support for relation taxonomies (T2); allow definition of disjoint, exhaustive or partition relationships in a hierarchy (T5); allow properties of a relation to be specified i (R2); allow axioms about a relation to be specified (A3); allow axioms that apply across the entire ontology to be specified (A2) and provide tools for reasoning about relations (Inf4).

Category Requirement Ontolingua Concepts Taxonomies Relations OCML LOOM SHOE RDF (S) DAML +OIL Defines concepts (C1) + + + + + + + Defines necessary conditions (C2) + + + - - + + Defines sufficient conditions (C3) + + + - - + + Defines necessary and sufficient + + + - - + + conditions (C4) Defines concept properties (rigidity, - - - - - - - identity, dependency) (C5) Defines attributes for a concept (C6) + + + + + + + Defines concept (class) attributes as well + + + - + P P as instance attributes (C7) Defines attribute properties (C8) - - + - - P + Defines customisable concept properties (C9) - - - - - - - Defines customisable attribute properties + - + - - - - (C10) Defines IS-A taxonomies (T1) + + + + + + + Defines relation taxonomies (T2) + + + - + + + Attaches attributes to IS-A taxonomies - - - - - - - (T3) Defines PART-OF taxonomies (T4) + - - - - - - Defines disjoint, exhaustive and partition + P + - - P P properties of a taxonomy (T5) Allows multiple inheritance (T6) + + + + + + + Defines relations between concepts (R1) + + + + + + + Defines properties of a relation (R2) + - P - - + + Defines n-ary relations (R3) + + + + - - - Defines customisable relation properties - - - - - - - (R4) Defines inverse roles (R5) - - + - - + + OWL

Functions Axioms Instances Production Rules Inference Mechanisms Other Defines functions (F1) + + + - - P P Defines complex functions with varying + + + - - - - conditions (F2) Defines axioms of a concept (A1) + + + P - - - Defines axioms that apply across the + + - P - - - ontology (A2) Defines axioms of a relation (A3) + + + P - - - Allows an instance to be a member of - - - + - - - more than one concept (Ins1) Defines and uses production rules (P1) - + + - - - - Reasons about IS-A taxonomies (Inf1) + + + P - + + Develops customisable inference - - - P - - - mechanisms for domain specific reasoning (Inf2) Reasons about PART-OF taxonomies - - - - - - - (Inf3) Reasons about relations (Inf4) - + + P - - - Specifies reasoning for customisable - - - - - - - properties (Inf5) Adopts modal logics (O1) - - - - - - - Defines a new construct that could be - - - - - - - used to represent the field view of the world (O2).

Summary of Outcomes Ontolingua is the most expressive in regard to requirements for spatial data. Ontolingua, LOOM or OCML best for field view (raster). OCML and LOOM support production rules. DAML+OIL and OWL don t provide axioms, which is a serious limitation for spatial data. None of the languages provide for all requirements new language required.

Conclusions Semantics are an important area of future research to allow: automated combination of multiple sources intelligent querying machine interpretation for advanced functions. Most methods developed to date are limited because they require a representation to be created of the semantics.

Conclusions (cont d) Possible future areas for semantics research: development of a geo-ontology language; methods for the determination of semantics without additional representations - perhaps based on data content; advanced applications of semantics, e.g. semantically intelligent analysis to: allow automated analysis; assist inexperienced users of GIS and provide immediate analysis for urgent applications.