MEDNARODNA PODIPLOMSKA ŠOLA JOŽEFA STEFANA JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL

Size: px
Start display at page:

Download "MEDNARODNA PODIPLOMSKA ŠOLA JOŽEFA STEFANA JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL"

Transcription

1 MEDNARODNA PODIPLOMSKA ŠOLA JOŽEFA STEFANA JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL ALEXANDRA MORARU ENRICHMENT OF SENSOR DESCRIPTIONS AND MEASUREMENTS USING SEMANTIC TECHNOLOGIES MASTER THESIS LJUBLJANA, JUNE 2011

2 ENRICHMENT OF SENSOR DESCRIPTIONS AND MEASUREMENTS USING SEMANTIC TECHNOLOGIES Alexandra Moraru

3 Master Thesis Jožef Stefan International Postgraduate School Ljubljana, Slovenia, September 2011 Evaluation Board: Doc. Dr. Mihael Mohorčič, Chairman, Department of Communication Systems, Jožef Stefan Institute and Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia Assoc. Prof. Dr. Oscar Corcho, Member, Department of Artificial Intelligence, School Of computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain Izr. Prof. Dr. Dunja Mladenić, Supervisor, Artificial Intelligence Laboratory, Jožef Stefan Institute and Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia

4 Alexandra Moraru ENRICHMENT OF SENSOR DESCRIPTIONS AND MEASUREMENTS USING SEMANTIC TECHNOLOGIES Master Thesis SEMANTIČNO BOGATENJE OPISOV SENZORJEV IN SENZORSKIH MERITEV Magistrsko delo Supervisor: Izr. Prof. Dr. Dunja Mladenić Ljubljana, Slovenia, September 2011

5

6 V Index 1 Introduction Motivation Thesis Contribution Overview of the Thesis Aims and Hypothesis Background Semantic Technologies Knowledge Representation Sensor Network Ontologies Description Languages RDF and OWL Reasoning Linked Data Materials and Methods Conceptual Framework Sources of Sensor Data Standardized Specifications - SWE Non-Standardized Specifications Ontology Collection Enrichment of Sensor Data Results Implementation of the Framework Datasets Standardized Dataset Non-standardized Dataset Ontologies OWL Ontologies - W3C Semantic Sensor Network Ontology Additional Ontologies Cyc Ontology Extension Extension of OWL Ontologies ResearchCyc Extension Enrichment of Sensor Descriptions Using Semantic Concepts from OWL Ontologies Translation of SensorML Descriptions to RDF Translation of Database Descriptions to RDF Using Semantic Concepts from ResearchCyc Ontology Enrichment of Sensor Measurements... 32

7 VI Index Enrichment of Measurements from Standardized Dataset Data Mining Tool - OntoGen Enrichment of Measurements from the Database Applications Sensor Search on the Enriched Datasets Ranking Data Publishing Discussion Querying on the Enriched Description SPARQL Querying Cyc Querying Related Work Conclusions Lessons Learned and Future Work Acknowledgements References Appendix A RDF Descriptions from Standardized Dataset RDF Descriptions from Non-Standardized Dataset Cyc Descriptions from Standardized Dataset Appendix B Sample of mappings between RDF URIs and Pubby URIs... 65

8 VII Abstract The increased interest in sensing the environment in which we live has led to the deployments of thousands of sensors which can measure and report its status. Besides the difficulty in managing the networks in which such sensors are deployed, handling the information that they capture imposes new challenges. In order to raise the impact that sensor networks can have, improving the usability and accessibility of the measurements they provide is an important step. The problem that this thesis is addressing is that of enrichment of sensor descriptions and measurements in order to provide richer data, i.e., data containing more meaning. Semantic technologies can provide methods for annotating sensor data with semantic concepts that would enrich their descriptions in terms of thematic, spatial and temporal properties. We propose a framework for automatizing the process of semantically enriching sensor descriptions and measurements with the purpose of improving the usability and accessibility of sensor data. First, the conceptual components of the framework are described and next their implementation is provided for real-live datasets of sensor data. The results of the implementation are semantic repositories of original datasets which enable the development of new applications. The advantages brought by the semantic enrichment to sensors are discussed on possible applications that can be developed on the enriched datasets. Through illustrative examples we approach the problems of sensor search and data publishing.

9 VIII Povzetek Povečano zanimanje za opazovanje okolja s pomočjo senzorjev, je pripeljalo do postavitve na tisoče senzorjev, ki opravljajo in sporočajo meritve. Poleg zahtevnosti upravljanja in vzdrževanja tovrstnih senzorskih omrežji, nove izzive prinaša upravljanje z informacijami, ki jih senzorji posredujejo. Pomemben korak k povečanju pomembnosti senzorskih omrežji je povečanje uporabnosti in dostopnosti senzorskih meritev. Magistrska naloga naslavlja problem semantičnega bogatenja opisov senzorjev in senzorskih meritev z namenom izboljšanja pomenske kvalitete podatkov. Semantične tehnologije nam ponujajo metode za označevanje senzorskih podatkov s semantičnimi koncepti, s čimer obogatimo opise senzorjev v smislu pokritih vsebin, prostora postavitve senzorjev in časovnih okvirjev meritev. Predlagamo okolje za avtomatizacijo procesa semantičnega bogatenja opisov senzorjev in senzorskih meritev z namenom izboljšanja uporabnosti in dostopnosti senzorskih podatkov. V nalogi najprej opišemo konceptualne komponente predlaganega okolja, nato podamo opis implementacije in uporabe na realnih senzorskih podatkih. Rezultat implementacije so semantični repozitoriji izvornih podatkov, ki omogočajo razvoj različnih aplikacij. Prednosti semantične obogatitve senzorjev pokažemo na primeru dveh aplikacij, ki jih je možno razviti nad obogatenimi podatki, in sicer aplikacijo za iskanje senzorjev in aplikacijo za objavo podatkov.

10 IX Abbreviations API = Application Programming Interface CF = Climate and Forecast DBMS = Database Management Systems DL = Description Logic LOD = Linked Open Data MMI = Marine Metadata Interoperability OGC = Open Geospatial Consortium OWL = Web Ontology Language O&M = Observation and Measurements RDF = Resource Description Framework RDFS = RDF Schema SensorML = Sensor Model Language SOS = Sensor Observation System SRSD = Semantic Repository of Sensor data SSN = Semantic Sensor Network (ontology) SSW = Semantic Sensor Web SWE = Sensor Web Enablement URI = Uniform Resource Identifier URN = Uniform Resource Name XML = Extensible Markup Language

11

12 1 1 Introduction Sensors are materials or devices which change their (conductive) properties according to a physical stimulus. These sensors can be attached to more complex devices, called sensor nodes, which can have computing and communication capabilities. More and more sensor nodes are embedded into physical objects used in everyday life, ranging from pacemakers, transportation cargos to electrical appliances. Furthermore, communication links can be established between these objects, organized into wired and wireless networks, creating what is called the Internet of Things and transforming the physical world into a type of information system [1]. The Internet of Things is gaining popularity with the development of the Internet towards a network of interconnected objects. From mobile phones, to transportation cargos, to electrical appliances and to any type of sensing devices, new applications are finding their way to daily life [2]. This development will provide new services and will enable new directions for communication (i.e. things-to-persons or thing-to-thing ) [3]. A major role in the development of Internet of Things is played by web accessible sensor networks and archived sensor data that can be discovered and accessed using standard protocols and application program interfaces [4], defined as Sensor Web by the Open Geospatial Consortium (OGC). However, for obtaining the maximum impact that Sensor Webs can have on the development of Internet of Thing, challenges have appeared not only for the physical infrastructures, but also for data management and processing, as the amount of data that sensors can produce is significant [2]. Traditional approach applied for handling existing sensor data (i.e., measurements collected by a sensor network) is to store it in a database and process it at a later time. In data mining and machine learning communities, technology has been developed to process and build models based on streams of data [5], therefore being able to build realtime applications based on such streams (e.g., intrusion detection systems based on stream mining). Up to this point, the systems are built in such way that integration of sensor data from two different systems is non-trivial to achieve. By describing the meanings of data streams and the context under which these data were collected (i.e., circumstances in which the sensor measurements were taken, such as environment characteristics, sensor settings, etc.) in a computer understandable manner, truly large scale distributed sensing systems can be built. This is where the semantic web protocol stack [6] and semantic technologies can play a key role. Extending Sensor Web towards applying semantic technologies leads to the development of Semantic Sensor Web (SSW) enabling the integration of different systems. Sheth et al. [7] propose the SSW as a solution for the problem of too much data and not enough knowledge that appeared once with the rapid development of sensor networks. In their view, the SSW represents semantically annotated sensor data with spatial, temporal and thematic metadata, facilitating thereby advanced querying and reasoning. Some of the directions adopted for achieving the integration of semantic technologies and Sensor Web are related to linked data (i.e. linked sensor data [8][9][10]), or to semantic annotation and composition of web services [11]. More general directions that

13 2 Introduction can be identified in building the SSW are: - Automatically annotate and enrich sensor data, by providing semantic metadata about spatiotemporal and thematic properties. - Publish annotated sensor data using shared vocabularies and standard schemas, in order to facilitate accessibility and enable sensor discovery. - Apply reasoning mechanisms on semantically enriched sensor data for solving problems such as sensor composition, event detection and network management. The enrichment of data generally refers to adding information, annotation or additional features to the data by means of computation or by pulling information from external sources (e.g., the web, databases, etc.). Semantic enrichment of sensor data denotes the process of associating semantic tags to initial sensor descriptions and measurements. These tags represent concepts, properties and relationships from an ontology and are used to describe the metadata associated to sensor data (i.e., measurement capabilities, observed phenomena, spatial properties, etc.). Making sensor data publicly available enables the development of new and useful applications. The methods for publishing sensor data can vary from standardized web services, such as OGS s Sensor Observation Service (SOS), to application specific methods, as the ones used by web platforms, such as Pachube1 or Sensorpedia2. However, such methods require prior knowledge of the infrastructures used, while publishing semantically annotated sensor data, following the linked data principles, would enable better accessibility. Moreover, when supported for integration with existing knowledge it would increase also the usability of published data. Reasoning, in general, is the process of producing new beliefs from a collection of believed propositions. It is strongly related to the field of logic and in the context of ontologies the logical formalisms are provided by a family of representation languages known as Description Logic (DL) [12]. Enriching sensors descriptions using ontology terms enables reasoning mechanisms that can be used to infer new knowledge, for further enrichment of data or to solve complex problems. Using semantic technologies for enriching sensor descriptions and measurements in scalable and heterogeneous sensor network is intended as a solution for better interoperability and easier maintenance. Through semantic descriptions it is possible to provide context for sensor networks, which can improve knowledge extraction from sensor data streams and facilitate reasoning capabilities. We propose a framework for semantic enrichment of sensor descriptions and measurements, with the purposes of automatizing the process of translating existing sensor descriptions into semantic descriptions and enabling semantic querying over sensor measurements. The primary focus of the work presented in this thesis is on the first general direction that we mentioned above, that of annotating and enriching sensor data using semantic technologies. Next, we take into consideration linked data as a method of publishing annotated sensor data, while the aspects of applying reasoning mechanism on sensor data are briefly mentioned as possible future directions of the work presented in this thesis. 1.1 Motivation Semantic technologies have been identified as one of the key enabling technologies for 1 Pachube, 2 Sensorpedia,

14 Introduction 3 sensor networks, [13] contributing to understanding and managing of the sensors and the associated measurements. To start with one of the advantages of applying semantic technologies to sensor networks, we can consider the interoperability support, in terms of comparison and data merging of different sensors networks. Consequently, providing the ability to automatically process more sources of sensor data together could also increase the usability of sensor data. As a result, applying semantic technologies to sensor networks will reflect into well organized and understood data, which could further enhance new solutions in solving complex problems, such as reasoning systems. Ontologies and other semantic technologies can be key enabling technologies for sensor networks because they can improve semantic interoperability and integration, as well as facilitate reasoning, classification and other types of assurance and automation not addressed in the OGC standards. A semantic sensor network will allow the network, its sensors and the resulting data to be organised, installed and managed, queried, understood and controlled through high-level specifications. Considering the large number of exiting sensor networks that use different methods for providing their metadata, it is important to enable their integration within the Semantic Sensor Web (SSW), to get access to the advantages that semantic technologies can bring. Our motivation is given by the challenges that appear with the increasing number of sensor deployed in more and more complex and heterogeneous networks. Due to the large and diverse communities that are participating at building such networks, problems may appear in searching and finding sensors published by different participants as well as in accessing and processing the sensor measurements. One of the aspects causing these problems is that of different structures and vocabularies used for describing the sensors. Therefore, enriching the sensor descriptions with semantic concepts for understanding the data (measurements) provided can be considered as a first step towards benefiting from the advantages that can be obtained by applying semantic technologies to sensor networks. 1.2 Thesis Contribution Enrichment of sensor descriptions and measurements bridges the gap between raw sensor data and domain specialists, by adding extra features and attributes. Furthermore, semantic technologies increase the usability of sensor data through common vocabularies and domain knowledge. The main contribution of the thesis is a framework for semantic enrichment of sensor data, which provides means of automatizing the process of semantically describing sensors. The framework components are identified and described in order to support the development of systems able to combine more sources of sensor descriptions and measurements which are further semantically enriched and consumed (i.e., queried, published). Based on the proposed framework, we provide a possible instantiation of its components, using real-world sensor data. For the enrichment process, exiting domain knowledge descriptions and vocabularies are identified and a subset of them is selected for semantically describing sensor data. The enriched datasets are stored in semantic repositories, which can be queried or made publically available on the web. Finally, the advantages of the enriched representation are discussed through illustrative examples. 1.3 Overview of the Thesis The rest of the thesis is organized as follows. Chapter 2 presents the goals of the thesis, together with the theoretical background that

15 4 Introduction supports these goals. There are three main goals that the work presented in this thesis aims at: define, instantiate and demonstrate the feasibility of a framework for semantic enrichment of sensor descriptions and measurements. The semantic technologies that support the development of the proposed framework refer to knowledge representation, description languages and reasoning. Providing access to the enriched data can be assured by following the linked data principles. The proposed framework is defined in Chapter 3, specifying its conceptual components: Sensor Descriptions and Measurements, Ontology Collection, Enrichment Components, Semantic Repository of Sensor Data and Data Consumers. The steps of the enrichment process are defined and different methods are later mentioned. Also, possible sources of data are analyzed and the semantic concepts required in the ontology collection are discussed. Chapter 4 presents the implementation of the proposed framework. For each component of the framework one or more solutions are provided: the Sensor Descriptions and Measurements are represented by two different datasets, a standardized and a non-standardized one. two sets of ontologies have been chosen for the Ontology Collection component, one composed by OWL ontologies and the other represented by a upper level ontology, namely ResearchCyc; extensions of the ontologies are done when required. enrichment components are represented by manually built rules for annotating original description with semantic concepts, together with data mining tools for extracting new features from sensor measurements. possible applications for the enriched sensor descriptions and measurements are discussed, including sensor search and data publishing. Advantages that the semantic enrichment can bring to sensor data are presented in Chapter 5 through illustrative examples of queries. Also this chapter reviews some of the related work making a comparative analysis to the work presented in this thesis. Finally, Chapter 6 brings the concluding remarks.

16 5 2 Aims and Hypothesis The purpose of the thesis is to propose, implement and evaluate a framework for semantic enrichment of sensor descriptions and measurements. The main goals of the thesis are the following: Define a framework for semantic enrichment of sensor data. Instantiate the framework components by implementing a system that comprises all of the major components. Demonstrate the feasibility of using semantic technologies for enrichment of realworld sensor data. The thesis hypothesis is that applying semantic technologies to sensor networks enables the enrichment of sensor description and measurements, which further facilitates creation of new applications. The rest of this chapter introduces the theoretical background related to semantic technologies applied in this work. 2.1 Background Semantic Technologies The requirements for building the SSW refer to knowledge representation, description languages and semantic reasoners. For the knowledge representation the focus is set on ontologies and an overview of the existing sensor network ontologies is presented. Two commonly used languages for knowledge representation, related to ontologies, are Web Ontology Language (OWL) and Resource Description Framework (RDF). The characteristics of these two description languages are briefly presented and analyzed from the power of expressivity point of view. Once having the domain knowledge represented in such description languages, reasoners can be used to benefit from this representation. Since the work presented in this thesis is mostly concerned with the first two requirements, the aspects regarding reasoning engines are only briefly presented. Thus, a set of existing reasoning engines are identified and categorized based on their resource requirements. The requirements described above are illustrated in Figure 1. The center of the figure represents a sensor web to which ontologies and description languages are applied in order to create semantically enriched description. On top of these descriptions reasoning engines could be applied Knowledge Representation A fundamental definition of knowledge representation is given by Davis et al. [14] as a surrogate, a substitute for the thing itself, seen as a model. The first step in applying semantic technologies in Sensor Web is to find a suitable way on how to represent the afferent knowledge to enable interoperability and reasoning mechanisms. One of the advantages that semantic technologies bring in knowledge representation are better scalability and interoperability, since adding or changing new information to a set of

17 6 Aims and Hypothesis Semantic Sensor Web Ontologies Description Languages Reasoning Engines Figure 1: Semantic Sensor Web. The image illustrates the semantic technologies applied for building the SSW. programs that use the same model resumes to changing the external model, while the design of those programs can remain the same, without the need of human involvement [15]. The model used for knowledge representation, from the point of view of SSW, must be suitable for capturing all the semantics required for enrichment of sensor data and must also provide a shared vocabulary used for publishing such data. One of the categories of knowledge representation appropriate for the model required is represented by ontologies. The definition of an ontology is given by Thomas Gruber as "a formal, explicit specification of a shared conceptualization". In other words, an ontology is an explicit formal specification of how to represent objects, concepts, and other entities that exist in some area of interest and the relations that hold among them [16]. Depending on how general or specific the knowledge represented by an ontology is, they can be classified into four categories (illustration in Figure 2) [17][18]: foundational ontologies (a.k.a. top or upper level ontologies) describe general concepts independent of a particular problem or domain (e.g. time, space, types of entities and relations, etc.) core ontologies describe the key domain conceptualization according to the foundational ontologies (e.g., finance, manufacturing, medicine, etc.), domain ontologies models of specific domains (e.g., loan service, workflows, sensor devices, anatomy, etc.) and the particular meaning of concepts related to that domain (some classifications do not separate core and domain ontologies). application ontologies specify the concepts used by particular application, seen as specializations of domain ontologies (e.g., loan service of MyBank, workflow of sensor manufacturing, etc.). Choosing the right ontology is important for providing the common vocabulary required in sensor discovery. The foundational ontologies do not provide enough semantics for the application specific tasks, while the specialized ontologies can introduce different representation for similar concepts. An example regarding SSW would be the case of representing the features of the domain sensed that is application dependent. However having totally different ontologies for similar applications will limit the accessibility and understanding of sensor data.

18 Aims and Hypothesis 7 Figure 2: Classification of ontologies on levels of generality. Four types of ontologies are identified based on their generality level: foundational, core, domain and application ontologies (adapted from [17].) Sensor Network Ontologies Foundational Ontologies time, space, type of entities and relations Core Ontologies finance, manufacturing, medicine Domain Ontologies loan service, worklows, sensor devices, anatomy Application Ontologies loan service of MyBank, workflow of sensor manufacturing The complexity of semantic sensor network technology is derived both from the semantic and the sensor network point of view. A detailed survey of semantic specification of sensor networks is provided in [19], where eleven sensor network ontologies are analyzed. In this section we will review a set of the most complete ontologies, considering the range of concepts that they define and how they have been utilized in diverse applications. In order to overcome one of the OGCs Sensor Web Enablement (SWE) shortcomings, that is the lack of semantics and reasoning, Neuhaus et al. [20] are proposing Semantic Sensor Network ontology. This ontology, represented in OWL, provides a language for describing sensors in a way that facilitates reasoning. The sensor description may contain technical aspects (calibration, temporal resolutionsampling frequency, accuracy, what it measures), information about access to the sensor for control and configuration, location, meaning of data. To incorporate all these, the proposed ontology is structured into four clusters (groups) of concepts: domain of sensing, sensor description, physical component and location description, and operational model (functions and processes). The domain of sensing with the Feature as the core concept is left unspecified, considering that it should be adapted to each application particularly. The sensor description is used to link the domain with the physical component and with the operational model. The SensorGrounding groups the physical, concrete characteristics of the sensor such as shape, size, materials, location, input and output format and others. The process and operational model are used to describe abstract sensors, like composition of physical sensors. In [21] the authors present how reasoning and querying can be applied on this ontology, using a query language and an OWL-API interface that gives access to a Java reasoner. Also an example of how sensor composition can be done is presented.

19 8 Aims and Hypothesis OntoSensor ontology [22] represents SensorML as an extension to IEEE SUMO (Suggested Upper Merged Ontology) and references concepts from ISO standard (schema for geographic information and services). It was meant to map the entire hierarchical concept from SensorML into OWL, but the environment used in developing the ontology imposed some constraints and limitations. Then CESN ontology provides concepts about sensors that are similar to the SensorML terminology. It has been utilized by Clader et al. [23], who propose a semantic solution to reason on sensor data in order to detect alarms or to explain various phenomena that may occur in a specific environment. They build a system that integrates the CESN ontology and rule sets within the Jena framework, being capable to reason about costal storm events based on data gathered from a wireless sensor network and other external sources. While the system seems scalable for large wireless sensor networks, the ontology used is not general enough. Similar concepts from CESN ontology can be found in Marine Metadata Interoperability (MMI) ontology which is focused on describing oceanographic related sensors and devices, but can be applied in other domains too. To summarize, the ontologies developed for modeling sensor networks have a set of common concepts related to the taxonomy of different types of sensors, physical properties of sensor devices, data acquisition and sensed domain. However, the features of the sensed domain may vary depending on the application where the sensor network is used and further development of this set of concepts is required. Selecting an existent ontology or developing a new one for a SSW application depends on the purpose of the application and on the community that will be involved. A standardized ontology is appropriate for applications addressing large communities with the purpose of making data available for wider use, while for application addressing narrower communities, a specific ontology, not necessarily standardized, could satisfy all the requirements Description Languages The description languages are used in representing ontologies. These are named ontology languages and are included in the larger family of formal languages. Ontology languages encode the domain knowledge and the rules used for reasoning on that knowledge. There are three main properties that an ontology language should possess [24]: Human intuitive syntax, also compatible with existing Web standards (such as RDF, OWL). Formally specified semantics that provides shared understanding. Expressive power. However, as the goal of knowledge representation is to facilitate reasoning on it, the problem of expressivity and efficiency arises. More expressivity helps in a better representation, but at the same time lowers the efficiency of the representation, namely the inference capabilities RDF and OWL Semantic technologies can provide a basis for creating vocabularies that are able to describe the concepts related to sensor networks and the relationships between these concepts. Two of the most common used languages for knowledge representation that can define such vocabularies are OWL and RDF. Both languages are W3C standardized and different versions are defined, presenting varying levels of expressivity. RDF is essentially a graph-based data model, having a statement as a building block, which is actually a subject-predicate-object triple. One or more RDF triples are used to describe a resource, which is identified by a Uniform Resource Identifier (URI) and represented as the subject of a triple. The object can be a simple string value, a data type

20 Aims and Hypothesis 9 or another resource URI related to the subject. The predicate indicates the relation between subject and object and corresponds to an edge in the graph; the predicate is also identified by a URI. In order to represent and exchange an abstract data model, a concrete syntax is needed. A commonly used syntax for representing RDF data model is given by Extensible Markup Language (XML). The nodes and predicates of a RDF graph are represented in XML as element names, attribute names, element contents and attribute values [25]. However, RDF by itself is just a data model and it uses RDF Schema for defining specific domain vocabularies. RDF is a domain-independent model and is complemented by RDF Schema, which helps in defining the terminology for particular domains. Moreover, RDF Schema supports definition of classes used to categorize resources along with the properties of those classes. Also, it enables representation of class hierarchies and inheritance defining the semantics of subclass concept and other related concepts (properties and sub-properties, domain and range restrictions). Therefore, RDF Schema defines the semantics of the concepts used in representing an RDF data model, which makes RDF Schema a primitive language for representing ontologies [26]. The OWL [27] language provides additional vocabulary and formal semantics to describe the meaning of the information, enabling a better machine interpretability than XML, RDF or RDFS. An OWL ontology describes data in terms of classes, properties and individuals (or instances). A property defines a relationship between classes of individuals and formal semantics are used to derive new logical consequences. Currently there are two specification of OWL: OWL 1 and OWL 2, which allow describing these classes and properties with a richer vocabulary, as it is explained further. For a trade-off between expressivity and efficiency, OWL 1 supports three profiles: Lite, DL and Full. The first profile is the least expressive and can be used for defining taxonomies and simple constraints such as cardinality (only 0 and 1 values), (in)equality, domain restriction, inverse, transitivity, symmetry. The second profile, OWL 1 DL, increases the expressivity to the maximum limit while maintaining tractability for reasoning systems. Among the features added to OWL1 DL are: full cardinality, negation, disjunction, enumeration. The third profile supports maximum expressivity, but cannot guarantee complete reasoning. As an extension of OWL 1, OWL 2 [28] brings new features, such as property chains, richer data types, extended annotation, qualified cardinality constructors, asymmetric, reflexive and disjoint properties, for which new algorithms have been developed to support them. Besides the DL and Full profiles found also in first version of OWL 1, OWL 2 provides three more profiles that can be applied in different scenarios for obtaining the desired efficiency. These profiles are: OWL 2 Existential quantification (EL), OWL 2 Query Language (QL) and OWL 2 Rule Language (RL). The first one is appropriate for describing ontologies with a large number of classes and properties that are used in problems that are solved in polynomial time. The second sublanguage is useful in ontologies with large number of instances, utilized in problems that employ polynomial time for reasoning in the form of query answering. The last sublanguage tries to maintain a balance between expressivity and efficiency, and is can be used with rulebased reasoners. To summarize, RDF is a data model based on subject-predicate-object triples and uses XML for specifying syntax. RDF Schema introduces semantics to a RDF data model; it describes concepts such as classes, properties of classes and hierarchies of these. However, RDF and RDF Schema support a limited number of semantic primitives. The advantage of OWL is better expressivity, but sometimes with higher costs regarding efficiency and reasoning capabilities. The increase in expressivity for each of these languages is presented Figure 3.

21 E x p r e s s i v i t y OWL 1 OWL 2 10 Aims and Hypothesis Choosing the right language for a specific application can be very challenging and is highly dependent on the application requirements. However, one observation that can influence the decision on the representation language to be used is about the upward and downward compatibility of these languages (concerning language expressivity). While for OWL sublanguages (Lite, DL and Full) the upward compatibility is assured, for RDF documents some restrictions might be required to make them legal OWL DL documents. Regarding the downward compatibility, this doesn t exist for the OWL sublanguages; nonetheless, as OWL uses RDF for its syntax every OWL document is a legal RDF document also [26]. Full DL Full DL More expressivity with the new features: property chaining (rules),qualified cardinality restrictions, asymmetric, reflexive, and disjoint properties, enhanced annotation capabilities High expressivity (loses tractability no complete reasoning) A class can be collection of individuals and also individual Maximal expressiveness (while maintaining tractability) Negation, disjunction, full cardinality, enumeration Lite Classification Hierarchies, simple constraints (in)equality, cardinality (0/1), domain restriction, inverse, transitivity, symmetry RDFS RDF XML Defines Vocabulary Organizes Vocabulary in typed hierarchies } Taxonomy (sub)class, type, (sub)properties, domain, range Data interchange Data model graph Building block triple (can be linked) Syntax Figure 3: Description languages expressivity. Layered schema of RDF, RDFS and OWL languages; Extension of Figure 1.4 from [26] Reasoning Baader et al. [24] presented Description Logic (DL) for ontology languages, as it can provide both well-defined semantics and powerful reasoning tools. DL models concepts, roles and individuals and the relationships between them are expressed through axioms. Based on the axioms stated in DL, new relationships can be inferred using a reasoning engine, that can deduce implicit knowledge from the explicit represented knowledge [24]. Therefore, a reasoning engine (or semantic reasoner) is a system able to draw conclusions, or to infer logical consequences, by applying logic rules to a set of facts or hypothesis from a knowledge base. For the SSW domain, we classified the exiting reasoners in 3 categories: distributed or large-scale reasoners, normal scale reasoners and reasoners for constrained resource devices. In the first category we refer to distributed platforms that are able to process large amounts of data, usually Web data. The existing or under development systems that must

22 Aims and Hypothesis 11 be mentioned are: Marvin [29] and LarKC [30]. While the first one uses a divideconquer-swap strategy that assures massive scalability is able to eventually reach completeness, LarKC is trading computational cost to incomplete reasoning, being intended for massive heterogeneous information. The second category covers reasoners that can normally run on a simple desktop machine and are meant for not very large ontologies, for domain specific problems where complete reasoning is required. They could be used in centralized relatively small sensor networks. Few examples of the exiting reasoners that we considered for this category are: Pellet3, Racer-Pro4, FACT++5, Cyc6 (the reasoning component). For the last category we consider the reasoners that can ran on resource constrained devices, such as sensor nodes. These types of reasoners are useful in large sensor networks, where a centralized system will not perform well anymore. Currently, to our knowledge, there are only some prototype implementations, one of these approaching a method of automatically compose a reasoner for the needs of particular applications [31]. The main idea is to eliminate from the rule sets those rules that are not required, resulting in less time and memory consumption Linked Data Publishing information on the Web has already changed once with the Web s evolution. If we consider Web 1.0, the data was in a static form and the interaction between the user and the data published was mostly read-only. The evolution to the Web 2.0 put the user as the central actor for generating information, through blogs or social media sites, resulting in a read-write interaction with the data. Web 3.0, also referred to as Semantic Web [26], uses semantic description languages such as RDF and OWL to provide a formal description of data and knowledge. Publishing information for the Semantic Web can be done as an object representation, described in RDF or OWL using structured vocabularies in the form of ontologies, or as a document annotated with formal metadata describing the content of the document using annotation languages (such as RDFa7). The philosophy behind the Sematic Web is to provide a machine understandable representation of data and to link this data so that it becomes discoverable and allows reasoning. As a result, properly exposing and linking data is essential for the success of the Web 3.0 and, consequently for the Web of Things8. The principles of linked data have been defined by Tim Berners-Lee in [32] and they are: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 3 Pellet, 4 RacerPro, 5 FACT++, 6 Cyc, 7 RDF data model can be serialized using different format. RDFa is another serialization format for RDF, compatible with (X)HTML 8 Web of Things is introduced as a refinement of the Internet of Things by integrating smart things not only to the Internet (i.e. to the network), but also to the Web (i.e. to the application layer)

23 12 Aims and Hypothesis 4. Include links to other URIs, so that they can discover more things. These principles have been largely adopted in the last years, by the Linked Open Data (LOD) community and several methods for publishing data following these principles have been documented [33]. Therefore, one can say that publishing the related information that describes the real world object in HTML and RDF/XML representations is straightforward once it is available. However, in the case of embedded devices, which may have intermittent internet connectivity and may also be mobile, obtaining such information is non-trivial for meaningful descriptions (beyond MAC address, potential IP address). For instance, looking at the sensors connected to some sensor publishing platform, such as, Pachube, it can be seen that the available description resumes to what is manually provided by the publishers. However, if the future is to offer services which will require information, such as the precision of a temperature sensor located on an embedded device with a given URI, this information will need to be automatically collected for the billions of such devices connected to the Internet.

24 13 3 Materials and Methods 3.1 Conceptual Framework The problem that the proposed framework is addressing is that of semantic enrichment of standardized and non-standardized sensor descriptions and measurements with the purpose of enabling sensor discovery for better accessibility and processing of sensor data. The enrichment of data generally refers to adding information, annotation or additional features to the data by means of computation or by pulling information from external sources (i.e. the web). One example of enrichment by computation is to generate features such as headache likelihood based on the barometric pressure values and their variation. An example of enrichment by pulling data from the web is adding tweets about the weather generated around the time the values have been measured. The enriched data is then usually further processed instead of processing the original data only. Ontology Extension Ontology Collection Descriptions Enrichment Query End-Point Semantic Browsers Sensor Descriptions and Measurements Measurements preprocess and enrichment Enrichment Components Semantic Repository of Sensor Data Inference Engines Data Consumers Figure 4: Conceptual framework. Illustration of the main components constituting the proposed framework. The conceptual framework, illustrated in Figure 3, is defined by the following components: Sensor Descriptions and Measurements Ontology Collection Enrichment Components Semantic Repository of Sensor Data Data Consumers The two components to start the framework with are the Sensor Descriptions and Measurements and an Ontology Collection. The sensor descriptions contain the metadata defining the sensor characteristics. The process of generating the metadata can follow a manual or an automatic approach.

25 14 Materials and Methods Manually generating metadata involves engineers aware of the sensor characteristics, who can describe the sensor metadata following a predefined schema (e.g. database schema, XML Schema, etc.). An automatic approach for generating sensor metadata assumes that the sensor nodes would have the capabilities of describing themselves, by sending their characteristics encoded in messages to a server. In this case physical capabilities of sensor nodes are to be considered, such as memory constraints or power consumption for transmitting metadata messages. The metadata generated by either of the two methods, manually or automatically, are usually stored in databases. The sensor measurements contain numerical values quantifying the changes of sensor properties and can be accessed using traditional database methods or streaming-based approaches. The ontology Collection consists of a set of ontologies necessary for describing sensor characteristics and providing context for sensor measurements. The main process in the framework is run by the Enrichment Components, where the sensor descriptions are enriched with semantic concepts and the sensor measurements are processed to generate new features which are then enriched with semantic concepts. The main steps of the enrichment process are: Analysis of the sensor descriptions and measurements for identifying the associated semantic concepts. Selection of the most appropriate ontologies out of the existing ontologies. Extension of the selected ontologies with the concepts specific to the domain of application. This mainly implies particularization of the observed properties and features of interest. Implementation of enrichment components, which are software programs that parse the sensor description and measurements, extracting the required metadata, and translate it with the associated semantics in a formalized language for semantic representation. The result of the Enrichment Components involved in the framework is a Semantic Repository of Sensor Data (SRSD), which contains the enriched sensor descriptions and measurements. The data from SRSD can be consumed by different clients, such as query end-points, semantic browsers and inference engines. The query end-point and the semantic browsers provide simple means for searching and browsing through the SRSD, supporting its representation format. The inference engines are represented by different semantic reasoners, able to infer conclusions based on the exiting facts stored in the SRSD by applying the rules defined by the ontology. A common application in which inference engines are useful is that of virtual sensors composition, but we can also mention anomaly detection of sensor observations and sensor network management. 3.2 Sources of Sensor Data The increasing interest in sensor networks and the utility of the measurements provided can be seen also in the high number of sources of sensor data. The format in which the sensor data is made available can vary from source to source, depending on the choice of the data provider. Therefore, two major groups of data sources can be identified: standardized and non-standardized. The first category refers to sources of sensor data that are made available in a standard format for a specific type of data, specified by a standardizing organization. Accordingly, the second category refers to sources of sensor data that do not belong in the first category. The framework proposed is indented to be able to use sensor descriptions and measurements in both formats. However it must be mentioned that the steps of the

26 Materials and Methods 15 enrichment process described at the beginning of this chapter must be followed for each different format of the source of data, in order to identify the semantic concepts behind their descriptions. Therefore standardized formats are preferred, as it is enough to analyze once the specification of the format and further apply it for more sources of sensor data that follow those standardized formats Standardized Specifications - SWE Recent standardization efforts resulted in specifications for models that can be used for encoding sensor system descriptions, with the scope of enabling interoperability in sensor web. Through the Sensor Web Enablement (SWE) initiative, OGC aims at providing a framework of standards that will enable applications and services to access different types of sensors distributed over a network, creating a Sensor Web. One can view the SWE architecture divided in two parts: the information model and the service model [34] (Figure 5). Two of the specifications in the information model are Sensor Model Language (SensorML) and Observation and Measurements (O&M), which provide general models and XML encoding for describing sensors systems and the processes associated. The O&M standard describes models and specifies XML schema for representing observation, measurement, feature of interest, observed property and other sensor related concepts [35]. Both specifications offer support in developing other standards from the service model, like SOS, Sensor Planning Services (SPS) or Sensor Alert Service (SAS) [4]. Sensor Web Enablement Information Model Service Model SensorML O&M SOS SPS SAS Figure 5: SWE Architecture. Building blocks of the SWE set of standards [34]. While knowledge representation languages provide the means for defining vocabularies that can describe the sensor network domain, standardized specifications of models describing sensor systems can be a starting point in defining these vocabularies. The conceptual model in SensorML, illustrated in Figure 6 contains the 4 main elements: Process Model, Process Chain, System and Component. All of them inherit properties of a more general concept, Abstract Process, which is also derived from Abstract Feature. The inherited properties that any process should describe are: name, description, inputs, outputs and parameters. Component represents all the physical models that cannot be divided and it can participate in Systems and Process Chains. The scope of Process Model is to provide an atomic description of a set of processes from the non-physical point of view (i.e. calculating wind chill factor from temperature and wind speed). Component and System models describe the physical processes, while the Process Model and Process Chain are used to represents the non-physical processes, which can be treated as mathematical models [36]. The general model that SensorML provides is a skeleton for describing sensors systems and processes, but for achieving interoperability in Sensor Web, SensorML

27 16 Materials and Methods Figure 6: SensorML Conceptual Model [36]. UML diagram representing the conceptual model for processes (all components of SensorML are modelled as processes). should be complemented with semantic specification (e.g. ontologies). On the other hand, the SensorML encodings follow the Object-Association-Object pattern, facilitating the association with RDF and Semantic Web Non-Standardized Specifications Other sources of sensor data may not follow a standardized specification for describing the data. Different organizations can choose their own data structure and concepts for specification of sensor data. Web platforms such as Pachube, which offers support for storing and sharing sensor data, define their own data structure and concepts. The publisher can send their data from a sensor node for storage, making it available to other users as well. Providing metadata describing the sensor data is important for usability of data and it is structured following the concepts and data structure specified by the platform. When registering a sensor node on Pachube the publisher can provide a description of the data sensed which can include information about the location of the sensor (latitude and longitude), exposure of the sensor node (indoor or outdoor), tags and units of measurements for data streams. The descriptions are in XML format and besides the publisher data they also contain automatically generated data, such as the timestamp of the last update, the current status of the sensor node (sending or not messages) or the minimum and maximum values for a specific stream. In many cases, the structure and concepts for describing sensor data can be provided by the system for storing the sensor data. This is also the case of database management systems (DBMS). Two basic approaches exist for storing sensor data using a DMBS. One is using distributed storage on the sensor network level (e.g., TinyDB) and the other is

28 Materials and Methods 17 using centralized storage on the middle level (e.g., MySQL, Oracle, etc.). The advantage of the first approach is that the data is retrieved directly from the sensor node, whereas in the second approach larger volumes of data can be handled. No matter which approach is chosen, the database schema design is important if the metadata (sensor description) is to be stored together with the sensor measurements. There has to be a good separation between the descriptions of sensors and their measurements, in order to avoid redundancies. 3.3 Ontology Collection The Ontology Collection component refers to all the ontologies used for annotating the enriched descriptions of sensor and their measurements. One can choose to build its own ontology comprising of all the concepts needed, or to select exiting ontologies that already describe most of the knowledge needed, and, if required, extend them with the concepts missing. The first option has the advantage of creating ontologies that are entirely used in the enrichment process, with the exact concept needed. However, ontology building process can be very expensive and requires domain experts. Therefore, selecting already built ontologies, if available, can be a better solution. Regarding the sensor networks domain, several ontologies have been built in the last years, setting the base concepts to describe the domain. A survey [19] of eleven exiting ontologies presents the main concepts identified in these ontologies, listed in Table 1: Sensor describes the sensing system (sensor node) and its components. Table 1: Concepts describing the sensor network domain. There are 4 base concepts proposed for describing the sensor network domain, each of them identified by several sub-concepts (Subtraction from Table 2 from [19]). Base Concept Sensor Physical Observation Domain Sub-concepts Sensor hierarchy Identity and manufacturing Contacting and software Deployment Configuration History Components Action and process Location Power supply Platform Dimension, weight, etc. Operating conditions Data/Observation Accuracy Frequency Response model Field of view/sensing Units of measurement Feature/Quality Sampled medium Time

29 18 Materials and Methods Physical identifies the physical characteristics related to a sensing system, such as location, platform to which it is attached and operating conditions. Observation describes measurements taken by a sensor and their characteristics in terms of accuracy, frequency, etc. Domain describes the properties being observed by the sensor. None of the ontologies analyzed in the survey provide means to fully describe all of these concepts. However they can provide a ground to which further extension can be added to meet the specific domain requirements. A classification in different layers of ontologies that are used to describe the sensor network domain can be found in [38]. Gray et al. are suggesting four layers of ontologies: upper layer, comprising of upper-level ontologies used for the interoperability between other ontologies. infrastructure layer, describing the information required for the infrastructure (i.e., sensor network deployment, services provided by the infrastructure, metadata about sensor streams.) external layer, representing concepts which are not directly related to the sensor domain, such as geographical information. domain layer, defining the domain concepts related to a specific scenario where the sensor networks are used (e.g., floods, landslides, oil spills, etc). A mapping of these layers of sensor network ontologies to the general classification of ontologies presented in Section can be considered as follows: the upper layer ontologies have direct correspondence to the foundational ontologies, the infrastructure and external ontologies could be classified as core or domain ontologies, while the ontologies from the domain layer correspond to the application ontologies. 3.4 Enrichment of Sensor Data The semantic enrichment components include modules that semantically enrich sensor descriptions and measurements. The modules can use different techniques for enrichment spanning from simple manually built mapping rules to more complex techniques from the machine learning or data mining domains. Moreover, the methods applied for the enrichment of sensor description can be different from those applied in the case of measurements, as the nature of data is different: descriptions are generally static, while measurements are dynamic (streams of data). Regarding the sensor description enrichment, manually building rules that would annotate the initial descriptions with semantic concepts can be a good solution when the description of sensor is structured and follows the same format. This can be the case for deployments managed by organizations, which can decide a specific format to be followed for describing their sensors. The advantage of manually built rules is their high accuracy assured by the human involvement and their applicability for a larger number of sensor descriptions (assuming that organizations specialized in sensor deployments would not deploy just one or two sensors). However, there can be cases, such as participatory sensing, which would involve different and smaller communities that would provide their sensor descriptions. In this case, the efficiency of manually built rules would dramatically decrease and more complex techniques would be required to learn the rules for annotating sensor descriptions. In the case of sensor measurement enrichment it is often the situation that the data is pre-processed before enrichment. Simple raw measurements might not have any complex semantics attached to it, while processing more sources of sensor data together might

30 Materials and Methods 19 result in more meaningful observations. For instance, from sensors such as temperature, wind and humidity more precise information about a natural environment can be obtained after pre-processing (e.g. real feel temperature). An example of mining sensor data for extracting a new feature can be found in one of our previous work [39], where we describe a vertical system integration of a sensor node and a toolkit of machine learning algorithms for predicting the number of persons located in a closed space. Data collected from temperature, humidity, light and atmospheric pressure sensors were labeled with manually collected data (the number of people in the lab, the number of computers running and the position of the window.) that constituted the training set for machine learning algorithms with the purpose of predicting the number of people in a closed space.

31

32 21 4 Results As described in the previous chapter the framework is composed of several conceptual components. The chapter presents a concrete implementation of the framework for realworld dataset and discusses possible applications using the provided implementation. 4.1 Implementation of the Framework For each of the components of the conceptual framework there are more concrete implementations that can be used. A schema of the implementation of the framework components is illustrated in Figure 7. All the conceptual components of the framework: Sensor Descriptions and Measurements, Ontology Collection, Enrichment Components, SRSD and Data Consumers are covered in the implementation. OWL Ontologies SSN Ontology Jena Framework SPARQL Endpoint SESAME MySQL Database JDBC RDF descriptions of sensors Pubby Data Publishing SensorML descriptios JAXB API ResearchCyc O&M measurements OntoGen OntoGenUI Figure 7: Instantiation of the conceptual framework. Relations between the implemented components of the framework. A summary of the implementation of the framework s components is provided in Table 2, while the rest of the chapter presents the details of the implementation.

33 22 Results Table 2. Summary of framework implementation. The table presents the implementation of the framework components together with a short description. Conceptual Component Sensor Descriptions and Measurements Implementation OGC standardized dataset Short Description SensorML and O&M standardized dataset in the area of ocean tides and currents. MySQL database Database containing sensor descriptions and measurements from a real test bed deployed in an outdoor environment in different regions of Slovenia. Ontology Collection OWL ontologies SSN ontology plus other ontologies and extensions. ResearchCyc ontology A few concepts from the general ontology of ResearchCyc system have been used and also extended with more specific ones. Enrichment Components Semantic Repository of Sensor Data Data Consumers Datasets Jena Framework + JDBC and JAXB APIs for access to data. Jena Framework is used in order to translate the enriched representations of the datasets into RDF format. The enrichment is based on manually constructed rules. OntoGen Data mining tool used for enrichment of sensor measurements. RDF descriptions stored in Sesame s repositories. Sesame s SPARQL endpoint Pubby ResearchCyc reasoner Ontogen UI The RDF descriptions are the output of enrichment component. They are loaded, together with the ontology collection into repositories provided by Sesame framework. Used for querying data. Open-source tool for publishing data according to linked data principles. Used for querying data. Used together with the OntoGen tool, for visualization of sensor measurements and the features extracted for enrichment. The datasets described in this section represent the Sensor Description and Measurements component of the framework. Two different formats were chosen for enrichment, a

34 Results 23 standardized dataset and a non-standardized one. Each dataset is described in the following two subsections Standardized Dataset The datasets chosen for experimentation contain description and measurements of sensors in the area of ocean tides and currents, available online9. The motivation for choosing this dataset is given by the large number of standardized sensor descriptions provided. The representation format is SensorML for sensor descriptions and O&M for sensor measurements, facilitating parsing and extraction of relevant metadata. The dataset was downloaded and processed offline. For the sensor description dataset used, each system is described in a separate document with the following characteristics: One document is used to describe one platform. Each platform is identified by a Uniform Resource Name (URN)10, can also have short or long names as simple strings, is classified in one or more networks (identified by URNs) and its location is defined in latitude and longitude coordinates. The components of each platform are defined as systems. Each system represents a sensor, is identified by a URN and can have a list of outputs. Each output is a property measured by the sensor, and is identified by a Uniform Resource Locator (URL) as an instance from Marine Metadata Interoperability11 (MMI) ontology, under the Climate and Forecast (CF) standard names parameter vocabulary, which is defining standard names for various types of observed properties. For the sensor measurements dataset, we have used a set of documents with the following characteristics: One document contains all the observations one sensor made for one property observed, for 20 days. For each measurements there are encoded a timestamp when the measurement was taken, a numerical value and a unit of measurement. The descriptions of the documents refer only to the data we have used, and not to all the information they contain. The total number of platforms described is 751 and there are identified 14 types of measured properties. In this work, we used a subset of 1900 measurement files (.xml) totalling 8 GB of data on the disk. In these files, 1379 sensors, measuring 14 properties generated over 17 million observations. The binary file resulted from parsing the.xml files occupied 72 MB on the disk while the indexes take 2.3 GB in memory Non-standardized Dataset The non-standardized dataset contains real-world sensor data collected from a sensor network that includes sensing devices for monitoring environmental conditions such as, temperature, humidity, luminance and pressure. 9Center for Operational Oceanographic Products and Services, 10 URIs can be classified in two categories URNs and URLs; the first category defines the identity of a thing, while the second category provides methods for finding it. For more details please check 11http://mmisw.org/orr/#

35 24 Results Data coming from sensors are stored in a centralized MySQL database server, which stores both the meta-data and sensor measurements. The database schema is closely related to the hardware design where the sensor node features a set of sensors (devices sensitive to physical phenomena) generating measurements. The database schema is composed of four tables, as shown in Figure 8. The three upper tables, Sensor Node, Sensor and Sensor Type store the meta-data describing the physical devices and phenomena observed. The Sensor Node Table stores information about each sensor node deployed in the network, providing unique IDs, text descriptions and geographical coordinates of the locations where the sensor nodes are deployed. The Sensor Table is used to uniquely identify the sensing devices attached to the sensor nodes. The description of the sensing devices is provided in the Sensor Type Table, which contains information about the physical device used and the phenomena it observes. The lower table stores the sensor measurements with their timestamp, numerical value of the measurements and the corresponding sensor ID. The meta-data and data are separated in this way to avoid overhead in the database. However, each measurement can be linked back to the sensor (through Sensor ID) and to the node (through Sensor Node ID) when needed to retrieve the full context. GPS coordinates are stored as meta-data in this implementation as all the sensor nodes used in this work were fixed. However, once sensors are mobile, the GPS modules themselves will be considered sensors and their measurements will be saved in the Measurement Table. A sensor node can have several sensors attached to it. Currently, this sensor network contains sensor nodes where each node has six sensors. Multiple sensors on the same node can be of the same type and they could measure the same phenomena (e.g. temperature). Generally, the number of sensors on one node is not fixed and it can vary for every node. The database is actively used for the sensor network mentioned and it frequently receives sensor measurements. Also when a new sensor node is deployed in the network, its description is automatically added to the database. Figure 8: Database schema. The figure illustrates the tables of the database schema, together with main attributes and relations between tables Ontologies For the Ontology Collection component of the framework, two sets of ontologies have been chosen for the enrichment of the datasets. The first set of ontologies consists of OWL ontologies. Most of the concepts from this first set are defined by a standardized ontology for describing the sensor network domain, which can be classified in the

36 Results 25 Figure 9: Overview of SSN ontology modules [13]. The Skeleton module represents the core concepts, as a lightweight minimalistic ontology which can be used directly or integrated in more complex ontologies. The rest of the modules are used to represent particular aspect of sensors and their observations (e.g., how sensors are deployed or attached to platforms, the measuring capabilities of sensors, etc.) infrastructure layer of the classification mentioned in Section 3.3 ; external ontologies have been added for describing the geographic locations and domain ontologies have been created through extension of concepts from the infrastructure layer. The second set of ontologies is actually represented by a single general ontology, namely ResearchCyc ontology. The two sets of ontologies have been used separately for enrichment. A summary of the ontologies used for enrichment are described in the following subsections OWL Ontologies - W3C Semantic Sensor Network Ontology The development of semantic sensor networks in the recent years resulted in the need of standardization regarding ontologies. Therefore, an incubator group from W3C was formed, with the purpose of developing ontologies for sensor networks (referred as SSN ontology) and search for appropriate methods for enhancing available standards with semantic technologies. There are identified several modules of the ontology, as reported in [13], some of them are: Skeleton, Deployment, System, Process (all ontology modules are illustrated in Figure 9). Each of the modules can be used for focusing on specific aspects of describing sensor networks, such as sensor properties, systems of sensors, data or sensor measurements and feature and property characteristics. As the experimentation work of this thesis required just a subset of the SSN ontology concepts and relationships, we provide details just on those concepts, as they are illustrated in Figure 10. The class System represents a sensing infrastructure and its subsystems can be represented using the predicate hassubsystem. The instances of the System class correspond to the sensor nodes from a network. The physical location of a sensor node is represented by the class Platform, which can have multiple systems attached. The network to which a system or a platform belongs to is represented by the class Deployment and several predicates can be used to represents this relation (for simplicity

37 26 Results Figure 10: SSN ontology. The figure illustrates the subset of concepts and relationships used we use only two predicates: indeployment, and its inverse deployedonplatfrom). The class Sensor is used to represent concrete sensing objects and the predicate observes indicates the property observed by a sensor (e.g. temperature, humidity). In order to represent physical sensor devices the class SensingDevice can be used, which inherits all the properties of the classes Sensor and Device. A property observed by a sensor must be an instance of the class Property. The predicate ispropertyof indicates the relation between a property and the entity sensed, the latter being represented as an instance of the class FeatureOfInterest. As the last two classes described, Property and FeatureOfInterest are domain dependent, they can be extended for each domain in particular. An example for such an extension is illustrated in Figure 11 and explained in Section For representing sensor measurements we have used three classes: Observation, Sensor Output and ObservationValue. An observation represents a situation in which a value of a property of a feature of interest is estimated by a sensor. The result of an observation is a sensor output which has a specific value Additional Ontologies For latitude and longitude coordinates the Basic GeoWGS84 Vocabulary12 has been used. It provides the namespace for representing the coordinates. In addition, for geographical names GeoNames13 database has been used, with its RDF representation. Each platform was linked to the nearby location, based on geographical coordinates and determined using the findnearbyplacename web service provided by GeoNames. An example illustrating the geographical metadata associated with a platform is given in the following paragraph. Notice that the geo prefix stands for the Basic GeoWGS84 Vocabulary, where location is a property that links a thing with the point where it is located

38 Results 27 <rdf:description rdf:about="urn:x-noaa:def:station:noaa.nos.co-ops:: "> <geo:location rdf:nodeid="a111"/> </rdf:description> <rdf:description rdf:nodeid="a111"> <foaf:based_near rdf:resource=" <geo:long rdf:datatype=" <geo:lat rdf:datatype=" </rdf:description> For representing time metadata one class and one property from W3C time ontology14 have been used. The class name is ProperInterval for defining time intervals for sensor measurements. Twelve instances of this class have been defined to express different periods of the day, such as early morning, morning, noon, etc. The property used to link to these individuals is startsorduring from the same time ontology mentioned. The usage of these instances will be explained later in Section Cyc Cyc [37] is an artificial intelligence project that aims at building a general ontology and a knowledge base for representing common sense knowledge. The Cyc technology components that are of interest to our work are the knowledge base, the representation language (CycL) and the inference engine. The Cyc knowledge base (also referred to as Cyc ontology) is a formalized representation of fundamental human knowledge: facts, rules, and heuristics for reasoning about the objects and events of everyday life. Cyc s knowledge is represented in CycL, while its inference engine performs general logical deduction. One of the advantages that Cyc is bringing is the very broad knowledge base covering common sense knowledge, as well as domain specific knowledge for a number of domains, which can support description of the domain of sensing for various sensor networks and also provide context for different applications. Another advantage is that of the specialized inference engine which performs modular search in the proof space enabling reasoning at large scale. The Cyc knowledge base is organized into "microtheories", which are focused on providing context for particular domains at different level of details or different time intervals. The microtheory structure allows Cyc to independently maintain knowledge which can be contradictory for particular domains, enabling also a better performance of the system, by giving the possibility of controlling the inference processes. In our work, we used the ResearchCyc ontology, a licensing version of Cyc for research and academia. No major modifications have been done, except for introducing some simple predicates meant for illustration purposes. However, Cyc knowledge base can be modified and extended to meet the requirements of very specific domains, like the sensor web. Considering the vast amount of knowledge already represented in Cyc and also the strong reasoning capabilities that it provides it was chosen for experimentation with one of the selected datasets. The concepts used for the enrichment of sensor data are illustrated in Figure 12, while details on each of these concepts are given in Section Similarly to OWL, Cyc uses classes and individuals to represent things that we want to describe, but instead of properties it uses predicates for declaring the relations between classes or individuals. The difference consist in the fact that a predicate can have more than two arguments, and also predicates can be themselves arguments for other predicates. However, for describing our data we have used only binary predicates, which 14

39 28 Results correspond to the OWL properties Ontology Extension As mentioned in Section 3.3 the existing ontologies do not always provide all the concepts required for a specific domain. This is also the case for our collection of ontologies. The following subsection presents our extensions of ontologies Extension of OWL Ontologies Usually the sensor ontologies define general concepts for describing sensors, while application specific concepts such as observed properties or feature of interest are left unspecified. Therefore the ontology extension is required mainly for these resources. Based on the definitions of the observed properties given in the MMI ontology, the Property and FeatureOfInterest classes have been extended with 7, respectively 6 subclasses as it can be observed in Figure 11. Therefore, for the Property class there are defined the following subclasses: Temperature, Height, Speed, Conductivity, Angle, Salinity and Pressure. For the FeatureOfInterest class there are 4 direct subclasses, namely Water, Air, Wind, PlatformMotion and 2 more subclasses of Water class: SeaSurface and SeaWater. An example for how these new classes are used is provided below, where the air temperature property is defined as an instance of class Temperature and the relation ispropertyof associates to it an instance of the class Air. <rdf:description rdf:about=" <rdf:type rdf:resource=" <ssn:ispropertyof rdf:resource=" </rdf:description> <rdf:description rdf:about=" <rdf:type rdf:resource=" </rdf:description> Figure 11: Extension of the SSN ontology for Property and FeatureOfInterest classes. Concepts added for defining properties and features of interest ResearchCyc Extension After an analysis of the concepts related to the sensor domain which already exist in

40 Results 29 ResearchCyc and comparison with the SSN ontology, in order to be able to represent sensor descriptions, it was considered necessary to introduce three new classes and three predicates. As they can be observed in Figure 12, the new classes introduced are: SensorPlatform, SensorNetwork and ObservedProperty. Their correspondence in the W3C SSN ontology is for the following classes: ssn:platform, ssn:deployments and ssn:property. The small differences in the names of the classes are justified by the fact that ResearchCyc ontology contains a more detailed hierarchy of classes. For instance the class Platform from ResearchCyc would refer to "a large, flat, construction artifact which is usually elevated", which is a too general term compared to a sensor platform. Figure 12 represents the concepts that already exist in ResearchCyc ontology and to which the new concepts have been linked to (the isa predicate can be considered equivalent to rdf:type property). Platform Network Temporal StuffType Sensor hassensor onplatform isa isa isa isa SensorPlatform SensorNetwork Observed Sensor-Device Property deployedinsensornetwork objectfoundinlocation detects Place longitude latitude AngularDistance Color Legend new concepts already existing concepts Figure 12: Sensor related concepts from ResearchCyc. The newly introduced concepts are represented in relation with the already existing concepts from ResearchCyc ontology Enrichment of Sensor Descriptions The semantic enrichment component can be implemented using vocabularies and ontologies related to the sensor network domain. The enrichment of sensor descriptions from the two datasets chosen consists of generating enriched representations of original descriptions using semantic concepts. This is done by manually creating rules that are extracting the information from the datasets and attaching the corresponding semantic concepts from the collection of ontologies. For the standardized dataset the enrichment is done with both collections of ontologies, separately. For the non-standardized dataset only the OWL ontologies are used Using Semantic Concepts from OWL Ontologies The result of using OWL ontologies for enrichment of sensor descriptions is a set of RDF descriptions of the original datasets. The RDF descriptions are stored in an RDF repository representing the SRSD component of the framework. The ontology concepts are used for the schema behind the RDF representation of sensor descriptions. As mentioned in Section the central ontology used is the SSN ontology to which additional external ontologies are added and some extensions are done. One observation regarding the use of concepts names in the rest of the section is that if it is not explicitly specified to which ontology a concept belongs then it belongs to SSN ontology.

41 30 Results For each of the datasets manually created mapping rules are used for the sensor descriptions enrichment. The rules are implemented by software programs that parse original descriptions and enrich them with the ontology concepts. The two following subsections describe the enrichment of each of the datasets and the software technology used for this Translation of SensorML Descriptions to RDF The language used to describe the sensors in the standardized dataset is SensorML, part of the SWE. The mappings applied for the standardized dataset chosen for experimentation are presented hereinafter. Platforms described in SensorML represent instances of the class Platform. The networks to which these platforms belong are represented as instances of the class Deployment. The platforms components are represented as instances of SensingDevice class. The properties observed by the sensing devices are defined as instances of the subclasses extending the Property class (illustrated in Figure 11) and they are related to the sensed domain by using the relation ispropertyof and the instances of the subclasses extending the FeatureOfInterest class. The geographical locations of the platforms are given by latitude and longitude coordinates which are represented using the lat and long relations from the GeoWGS84 vocabulary. Then the findnearbyplacename web service provided by GeoNames is called in order to find the name of the closest populated place to the platform. For extracting the information from SensorML documents the Java Architecture for XML Binding (JAXB) API was used for unmarshalling the documents into Java objects. The JAXB bindings for SensorML schema are provided by a library developed in the OGC Schemas project15. After having the SensorML documents represented as Java objects, Jena libraries16 were used to build an RDF repository consisting of triples of sensor descriptions, based on the rules described above. The URI necessary for describing the RDF resources from our sensor descriptions are represented by the URN and URL from the dataset, where they are provided, or they are specially created for our representation. Therefore each RDF resource representing platforms, deployments and sensing devices are identified by their URNs from the original dataset. Examples of such URNs are provided below and a detailed RDF description of them can be found in Appendix A: platform URN: urn:x-noaa:def:station:noaa.nos.co-ops:: deployment URN: urn:x-noaa:def:network:noaa.nos.co-ops::waterlevelactive sensing Device URN: urn:x-noaa:def:sensor:noaa.nos.co-ops:: :F1 Further, the properties observed by the sensing devices are identified by the URL from de vocabulary describing them, while for representing the feature of interest we generate URLs. Examples of these URLs and the pattern for generating the features of interest are provided below: property URL: feature of interest URL pattern: prefix/featureofinterestsubclassname_on_station_stationnumber

42 Results 31 feature of interest URL: The platform locations are not uniquely identified in the dataset, so we represent them as blank nodes17 in order to link the platform to its geographical coordinates. However, using blank nodes can be avoided if we consider the platforms as spatial things, as we will see for the database dataset Translation of Database Descriptions to RDF The mappings of database content to semantic concepts were done respecting the schema provided by the SSN ontology and not the database structure, as it illustrated in Figure 13 and explained further. The information about the sensor nodes deployed in the network and the sensing devices attached are represented in separate tables in the database. Each row of a table is describing an instance, which is uniquely identified by the primary key of the table. Therefore, these instances were mapped directly to System respectively subclasses of SensingDevice classes. However, for the instances of the Platform class, representing the physical objects to which the systems are attached, there is no unique identifier in the database. Therefore we used the sensor nodes keys to identify the platforms, with the assumption that on one platform there is only one sensor node attached. However this is not the case for all real-live deployments, another option would be to use the geographical coordinates. The geographical coordinates are directly describing the platforms, as they are defined also as instances of SpatialThing from the GeoWGS84 vocabulary. Another piece of information stored in the database is the type of the sensing device placed on a sensor node. We have extracted types of sensing devices from the Sensor Type table of the database, which were mapped as subclasses of the SensingDevice class of SSN ontology and next used for specifying the type of individuals. Figure 13: Enrichment of database of sensor descriptions and measurements. The arrows point to the concepts from the SSN ontology which were used for enrichment. The properties observed by a type of sensor are specified as well in the Sensor Type table. We have decided to automatically create these properties as individuals of the 17 Blank nodes are used in RDF to represent resources which conceptually have no names. In RDF serializations, blank nodes can be given names, but these names are unique only within the context of the particular RDF document and can never be referred to outside of the current document [40] (page 84).

43 32 Results Property class. This is done for the flexibility of introducing new properties observed by the sensors, as the database does not use a specified standard for defining these properties. For accessing the content of the database the Java Database Connectivity (JDBC) API was used in a software program that implemented the mapping described above. Similarly with the enrichment of the previous dataset, the Jena libraries were used to build an RDF repository. As opposed to the standardized dataset, where some of the resources were already identified by URIs, in the case of the database description we had to generate all the URIs for the RDF representation. Therefore the URIs are composed of base + identifier, where the base could be for example: The identifiers are next divided into identifiers for instances and concepts (such as subclasses), and they would add after the base one of the strings resource/ or vocab/. Next, the resource identifiers are continued with the type of resource and IDs from the database, while the concepts are identified by the given names. Examples of URIs of instances are given below: system URI: sensing Device URI: platform URI: Using Semantic Concepts from ResearchCyc Ontology The enrichment of sensor descriptions using ResearchCyc ontology has been done for the standardized sensor descriptions only. The extension of ResearchCyc ontology has been done in the direction of having similar concepts as in the SSN ontology. Therefore, similar mapping rules were used to insert a subset of the sensor descriptions into the ResearchCyc knowledge base and used later for querying: Platforms described in SensorML represent instances of the class SensorPlatform. The networks to which these platforms belong to are represented as instances of the class SensorNetwork. The platforms components are represented as instances of Sensor-Device class. The properties observed by the sensing devices are defined as instances of the ObservedProperty class and they are related to the sensed domain using the predicate detects. The latitude and longitude coordinates of platforms are represented using the latitude and longitude predicates. All the coordinate values are represented as AngularDistances Enrichment of Sensor Measurements Enrichment of sensor measurements is done for stored data, either in a relational database or in XML documents. This has the advantage of access to all the archive data stored and also gives the possibility to change the models applied for processing such data at any time. However, a streaming-based approach is also possible and in this case several implementation aspects should probably be reconsidered. Besides that simply accessing the data would require implementation of appropriate mechanisms (e.g. push or pull methods) another aspect to be reconsidered is representation of time intervals of measurements (e.g. sliding window protocol). The enrichment of sensor measurements has been done using concepts from the OWL ontologies. The following two subsections report on the enrichment of the two datasets.

44 Results Enrichment of Measurements from Standardized Dataset18 The dataset contains sensor measurements for a period of 20 days and no real time data. Therefore we decided for computation based enrichment, while the initial numerical values were not included in the SRSD. A data mining tool was used to process the sensor measurements for extracting knowledge from the raw measurements. Instead of further processing the enriched data with our data mining tool, we annotated it according to the collection of OWL ontologies, and exported it in RDF format. The reason for which this is useful for our framework is that it permits the user to take advantage of knowledge extracted from the raw measurements. The sensor measurements are numeric values annotated in O&M according to the Integrated Ocean Observing System vocabulary. There are over 17 million measurements; therefore, one can say that the data is suitable for intensive numeric processing rather than semantic reasoning. We generated several features from the sensor measurements and we focused on two of them: the wind and sea conditions for sailors according to the Beaufort scale [41] migraines caused by atmospheric pressure according to pressure values published in medical studies [42] For the wind and sea conditions case, nominal values have been calculated using the wind speed measurements. A total of 26 nominal values, such as: Calm, Flat, Fresh Breeze, etc. have been defined as individuals of ObservationValue class, as they are illustrated in Figure 14. For migraines case, based on the relationship between migraines and the morning barometric pressure measurements, as well as a rise in barometric pressure over the preceding 24 hours reported in [42], we have defined three individuals of ObservationValue that would illustrate the risk of getting a headache: NoHeadache, Headache and HighHeadache (Figure 14). In some sense, the headache related exports can also be seen as virtual sensors as there are no real-world sensors which would output such value. Also, we export time of the day according to related concepts we have defined (i.e. EarlyMorning, LateEvening) instead of fine grained time stamps for more intuitive querying. We introduced 12 time intervals, as illustrated in Figure 15. Therefore, the time interval for an enriched measurement is described using the startsorduring19 property for the date and the time of the day when the measurement was taken. For describing the wind and sea conditions according to the Beaufort scale, measurements from 264 sensors observing wind properties were processed. From each sensor there were computed 67 observations (each with 2 results) describing wind and sea conditions. For the headache related observations 220 sensors measuring atmospheric pressure characteristics were used and 18 observations were generated for each sensor. As a result, instead of feeding 17 million measurements into the SRSD, we reduced the dimensionality to about observations which also have the advantage of being more intuitive in addition to reduced usage of memory space. 18 The work has been done in collaboration with Carolina Fortuna 19

45 34 Results Figure 14: Individuals of ObservationValue class used for representing sensor measurements. Image obtained using the ontology editor tool Protégé 4.1. Figure 15: Time intervales defined for different times of the day. Image obtained using the ontology editor tool Protégé 4.1 For the sensor measurements generated from the original O&M dataset, each measurement was considered an instance of Observation, linked to the sensor that made the observation and defined by time metadata. An observation can have one (headache) or

46 Results 35 two (wind and sea conditions) results represented as instances of SensorOutput class. Each sensor output instance has a single observation value. Example of RDF descriptions can be found in Appendix A Data Mining Tool - OntoGen OntoGen tool that we used is based on [43] and is conceptually composed of an in-house database implementation tightly connected with data mining and machine learning algorithms. The extension of the tool that we are using here also integrates with our custom server which is able to serve XML and RDF/OWL. For the purpose of this work we only use database, the indexing, the server, the RDF/OWL export module and the user interface to illustrate OLAP style queries. The database schema implemented is illustrated in Figure 16. By SensorNode we refer to a collection of collocated sensors. By Sensor we refer to an entity which observes/measures a particular phenomenon. By SensorType we refer to aspects related to the Sensor, it specifies what exactly is that sensor sensing, its URN, the unit of measurement, etc. These three tables store the meta-data for each measurement. The actual measurement values and their associated time stamp are stored in the SensorMeasurement table. Measurements are linked to sensors that performed them and the sensors are then further linked to meta-data about them: the node they are attached to and the type Figure 16: Data mining on sensor measurements. Database schema implemented in OntoGen tool Enrichment of Measurements from the Database For the sensor measurements from the database dataset we chose a different approach for the enrichment process compared to the previous dataset. We have chosen to represent the measurements just by their value, without extracting any other features, in order to illustrate the possibility to represent real-time measurements. Therefore, raw measurements are annotated with concepts from SSN otology. For each sensing device their last observation is represented as an instance of Observation, to which the time stamp is added. The observed value is represented using instance of two other classes SensorOutput and ObservationValue (Figure 13). The numerical value is attached to the instance of ObservationValue. The last observed value of a sensing device is update regularly, by querying the database and replacing the timestamps and measurement values in the enriched representation. For the historical measurements average of the last time intervals is calculated in a separate view of the database and then mapped to RDF representation. In our implementation the update of the last measurements is done every 15 minutes, while the average of historical

47 36 Results measurements is calculated for every day. 4.2 Applications Once having the enriched representation of sensor descriptions and measurements, different application can be built on the top of them. In the following subsections we describe possible applications and their implementation, while in Chapter 5 we will illustrates their usage Sensor Search on the Enriched Datasets Sensor search is one of the first applications that would benefit from enrichment of sensor descriptions and their measurements. The search can refer to finding specific sensors, from which one could be interested in gathering data, or searching directly through sensor measurements for explicit values or different events. While we do not affirm that sensor search would improve the performance of searching, as we have not conducted such test, we are more interested in formulating queries for retrieving the results we need. In order to do this, we have used a semantic repository in which we store the enriched dataset and search through it. The tool that is implementing the SRSD component of the framework is actually a framework for processing RDF data, namely Sesame20. It is capable to store, parse, query and perform inference on RDF data. It also offers an API for accessing it from Java environments. This API had been used for storing and updating the enriched representations of the datasets in the Sesame s repositories. For each of the datasets two separate repositories have been created from their RDF representation. Therefore, Sesame version 2.4 server and workbench applications have been deployed on a Tomcat server, version 6, installed on a local machine. Sesame s full support for the SPARQL [37] query language has enabled us to formulate and run different queries on our datasets. Through these queries we try to illustrate some of the advantages of semantic enrichment of sensor data. The SPARQL end-point used to process the queries represents a Data Consumer component in the framework. These queries will be discussed in Chapter Ranking Searching through thousands of sensors can be difficult if one cannot be sufficiently specific. The large volumes of data produced by such sensor networks require special techniques not only for management and processing but also for its discovery. We proposed and implemented a system for sensor search, based on matching user s given keywords against information extracted from standardized sensor descriptions. The work has been reported in [44]; here we provide a summary as follows. The goal of the search is to retrieve and rank a list of sensors based on the user s request. The user provides a keyword query, a geographic location (given by latitude and longitude coordinates) and a distance (interpreted as a radius around the location). We first limit all the sensors to the given location and range. In order to rank the results of a keyword search, we have implemented a query dependent version of the PageRank algorithm. The dataset chosen for experimentation of sensor ranking is the standardized dataset already described in Section

48 Results 37 The input for the ranking algorithm PageRank is a directed graph, and it gives a score to each of the nodes as a result. PageRank is based on the random walk model, i.e. a large number of users walk the graph choosing at each step to either walk to a neighbour of the current node or to jump to any random node in the graph. The number of users which are expected to be at a given node at a moment in time gives the score of that node. Personalized PageRank is a version of PageRank adapted to be query dependent. The place where the semantics of the sensor descriptions come into play is the transition matrix used by the ranking algorithm. The sensing devices have been considered as nodes in the directed graph used by the algorithm and the transition matrix is a square weight matrix which contains for each sensor pair the weight of their link. Therefore, for each two sensors i and j, we have assigned the value 5 if i and j measure the same thing, 4 if they are on the same platform, 1 if they are in the same deployment and 0 otherwise. The details of the implementation of the ranking application can be found in [44] Data Publishing Making sensor data publicly available enables the development of new and useful applications. The methods for publishing sensor data can vary from standardized web services, such as OGC s SOS, to application specific methods, as the ones used by Pachube or Sensorpedia21. However, such methods require prior knowledge of the infrastructures used, while publishing it following the linked data principles would enable better accessibility. Moreover, when supported for integration with existing knowledge it would increase also the usability of published data. The requirements for applying linked data principles to sensor data are discussed by Sequeda and Corcho in [45], where they analyze aspects such as time and space representation of streaming data. They propose a URI-based mechanism and identify a set of requirements for publishing Linked Stream Data. A sub-set of these requirements are: 1. Sensors should be identified by URIs. 2. Stream data emitted by sensors should be identified by URIs. 3. The information returned by a sensor URI should be its metadata 4. The information returned by a stream data URI should be the observations of the sensor. In addition, representation of time and space in URIs is discussed and several examples are given. Janowicz et. al. [46] describe a URI schema for some of the standards from SWE, where they focus on three main components: features of interest, sensors and observed properties. Problems such as object identity, granularity and allowed processing steps are discussed and the view is centered on the continuous sensor measurements rather than entities (e.g. sensor, observation). We applied the linked data principle for publishing the non-standardized dataset in its RDF representation. The URI from the RDF descriptions are dereferenced by applying a linked data frontend for SPARQL end-points implemented by the Pubby22 web application. Pubby is an open source tool that provides Linked Data interfaces to SPARQL end-points by rewriting URIs found in the RDF dataset into Pubby server s namespace and showing simple HTML interface about each resource. Pubby has been

49 38 Results deployed on a Tomcat server on a local machine and connected to the Sesame s SPARQL end-point. In order to translate the RDF dataset s URI into dereferenceable URIs handled by Pubby some mappings had to be configured. They can be found in Appendix B. Our first results on publishing sensor data on the web have been reported in [47] and [48].

50 39 5 Discussion In the previous chapters of this thesis the framework components have been described, together with our choice to implement them. Here we discuss the utility of the enriched representation from the perspective of formulating queries for solving different search problems. Furthermore, we describe the related work and provide a comparative analysis. 5.1 Querying on the Enriched Description SPARQL Querying SPARQL is the W3C recommendation from RDF query language. A simple SPARQL query can be considered as an RDF triple, with the exception that each of the subject, predicate and object of the triple may be a variable. A protocol service that supports querying a knowledge base based on SPARQL specifications is called SPARQL endpoint. Sesame is an open source RDF framework that offers a SPARQL endpoint and supports RDF Schema inference. For firing queries with Sesame, RDF triples and the ontologies used were previously loaded in one repository. To illustrate advantages of searching using semantic concepts instead of a keyword based search, we show the results of query considering the following scenario: Someone is interested in all the properties observed by a sensor, which are related to water. Table 3: Keyword versus semantic based search. The table shows two SPARQL queries and their results. Keyword based search SELECT DISTINCT?obs?label WHERE{?obs rdf:type ssn:property.?obs rdfs:label?label. FILTER regex (?label,"water","i")} Semantic based search SELECT DISTINCT?obs?label WHERE {?obs rdf:type ssn:property.?obs ssn:ispropertyof?prop.?prop rdf:type myprop:water.?obs rdfs:label?label.} Results Obs Label Obs Label < _surface_height_above_sea_level> < _surface_height_above_sea_level> < _water_temperature> "Water Level Predictions" "Water Level" "Water Temperature" < _sea_water_velocity> < ectrical_conductivity> < height_above_sea_level> < height_above_sea_level> < salinity> < peed> < mperature> "Current Direction" "Conductivity " "Water Level Predictions" "Water Level" "Salinity" "Current Speed" "Water Temperature"

51 40 Discussion First, a keyword search is ran, based on the labels of the properties. Second, the search is performed considering the Property and FeatureOfInterest concepts. The SPARQL queries for this search and their results are presented in Table 3. It can be noticed that the semantic query gives more complete results, compared to the keyword search. This example is given just to show one of the improvements of semantic enrichment of data, however, keywords are often used also in semantic queries. Effect of semantic enrichment of sensor description is illustrated in the following set of questions related to the geographic position of the sensor platforms. The first type of query using the latitude and longitude coordinates can be: Which are the platforms with sensors attached that can observe properties labeled with water temperature, located between 61 and 62 latitude and and -150 longitude? For the second type of query the geographical name of the place found close to the platform is used instead of the latitude and longitude coordinates; for the coordinates from the above query, one location in that area is Anchorage, a city in the State of Alaska, USA and the query is formulated as: Which are the platforms with sensors attached that can observe properties labeled with water temperature, located near Anchorage? Table 4: SPARQL query related to geographical position of sensor platforms. The query uses latitude and longitude coordinates to locate the sensor platform. Query SELECT DISTINCT?platform?lat?lon WHERE {?platform geo:location?loc.?platform ssn:attachedsystem?sens.?sens ssn:observes?obs.?obs rdfs:label?label.?obs rdf:type?obstype.?loc geo:lat?lat.?loc geo:long?lon. FILTER (?lat<62 &&?lat>61 &&?lon<-149 &&?lon>-150 && regex(?label, "water temperature", "i")) } Results Platform Lat Lon <urn:x- noaa:def:station:noaa.nos.co- OPS:: > <urn:x- noaa:def:station:noaa.nos.co- OPS::COI0303> <urn:x- noaa:def:station:noaa.nos.co- OPS::COI0302> <urn:x- noaa:def:station:noaa.nos.co- OPS::COI0301> <urn:x- noaa:def:station:noaa.nos.co- OPS::COI0206> Table 5: SPARQL query related to geographical position of sensor platforms. The query uses place names to locate the sensor platform. Query SELECT DISTINCT?platform?name WHERE {?platform geo:location?loc.?loc foaf:based_near?city.?city gn:name?name. FILTER regex (?name,"anchorage","i")} Results Platform Lat Lon <urn:x-noaa:def:station:noaa.nos.co- OPS:: > <urn:x-noaa:def:station:noaa.nos.co- OPS::COI0303> The SPARQL queries and the results are illustrated in Table 4 and Table 5. We can immediately notice that the second query is more intuitive to a human user compared to

52 Discussion 41 the one using latitude and longitude values. However, for the first type of query, the results will be more exact, returning all the sensors in the specified perimeter, whereas for the second type the results depend only on the geographical names used. To show the usefulness of measurement processing and semantic enrichment we have selected two queries, one for the properties from the Beaufort scale and the other for the headache related properties. Where have been registered any afternoon hurricanes? As it can be observed from the results presented in Table 6, there were found the exact locations where there were detected hurricanes in the afternoon. The two locations have been localized in bigger regions, based on the parentfeature property from the GeoNames ontology. Here it can be observed the usefulness of the semantic enrichment in the abstraction of the exact numerical values of the measurements to canonical ones, such as hurricane force. Table 6: SPARQL query on sensor measurements. Query related to Beaufort scale properties. Query SELECT DISTINCT?location?region WHERE{?platform ssn:attachedsystem?sensor.?platform geo:location?loc.?loc foaf:based_near?place.?place gn:name?location.?place gn:parentfeature?parent.?parent gn:name?region.?sensor ssn:madeobservation?obs.?obs ssn:observationresult?res.?obs time-entry:startsorduring tmint:afternoon.?res ssn:hasvalue obsval:hurricaneforce.} Results Location Harbor Beach Edge Water Region Huron County United States City of Norfolk Virginia United States Which were the dates with risk of headache in early morning in Cape Henry? The results of this query, together with its SPARQL representation can be found in Table 7. Similar to the previous query, it presents the advantages of semantic enrichment from the point of view of data abstraction. Table 7: SPARQL query on sensor measurements. Query related to the headache properties. Query SELECT DISTINCT?sensor?date?location where{?platform ssn:attachedsystem?sensor.?platform geo:location?loc.?loc foaf:based_near?place.?place gn:name?location.?sensor ssn:madeobservation?obs.?obs ssn:observationresult?res.?obs time-entry:startsorduring tmint:earlymorning.?obs time-entry:startsorduring?date.?res ssn:hasvalue obsval:headache. FILTER regex (?location,"cape Henry","i")} Results Sensor Date Locat ion <urn:xnoaa:def:sensor:noaa.nos.co-ops:: :f1> <urn:xnoaa:def:sensor:noaa.nos.co-ops:: :f1> tmint:earlymorning "Cape Henry Village " " " "Cape Henry Village "

53 42 Discussion Cyc Querying Cyc Query tool is a user interface which allows querying the knowledge base using CycL Language. CycL has been used for formulating queries on top of Cyc knowledge base, as it is an intuitive language that can be easily understood. A detailed description of CycL is beyond our scope, and it can be found in [50]. Note mentioning that Cyc concepts are represented with the #$ symbols as prefix. For the Cyc knowledge base containing the sensor descriptions a query containing a name of a geographical location has been ran. In Cyc s representation it is possible to specify the distance between the geographical location and the sensor platform, giving the user access to more expressivity in formulating queries. For instance, one of the previous queries, involving geographical locations, can be extended as follows: Which are the platforms with sensors attached that can observe properties labeled with water temperature, located at a distance of at most 400 kilometers from Anchorage city? The Query formulated in CycL and the results are shown in Table 8. Table 8: CycL query using geographical names. Query (#$and (#$detects?sensor #$Sea_water_temperature) (#$hassensor?platform?sensor) (#$objectfoundinlocation?platform?loc) (#$distancebetween #$CityOfAnchorageAK?LOC (#$Kilometer?DIST)) (#$lessthan?dist 400)) Results 5.2 Related Work Sheth et al. [7] propose the Semantic Sensor Web (SSW) as a solution for the problem of too much data and not enough knowledge that appeared once with the rapid development of sensor networks. In their view, the SSW represents semantically annotated sensor data with spatial, temporal and thematic metadata, facilitating thereby advanced query and reasoning. RDFa is adopted as an annotation language for two demonstrative applications that are using also several SWE standards. Moreover, rulebased reasoning is applied for determining specific weather conditions, such as freezing or blizzard. The idea of semantic annotation is taken further by Wei and Barnaghi [51] by using LOD resources for annotation that bring access to knowledge already represented and eliminates the risks of creating redundant data. Similarly, in our work we also use LOD resources for spatial properties and we try to reuse as much as possible the knowledge already represented. However in some situation extension of ontologies is required. A recent trend for making sensor descriptions and measurements available on the Web is to publish them on LOD cloud. The advantages and challenges of Linked Sensor Data are discussed by Keßler and Janowicz [10] as a solution for better sensor data accessibility without introducing very high complexity. The paper stresses out the importance of finding the appropriate links between different datasets from LOD and proposes a semiautomatic-way for generating them. Patni et al. [8] were the first to publish a large dataset of sensor descriptions and measurements, by first representing it in O&M standard and then converting it to RDF. The linked sensor data is using a sensor ontology schema based on the concepts from

Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr.

Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr. Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies Student: Alexandra Moraru Mentor: Prof. Dr. Dunja Mladenić Environmental Monitoring automation Traffic Monitoring integration

More information

A System for Publishing Sensor Data on the Semantic Web

A System for Publishing Sensor Data on the Semantic Web Journal of Computing and Information Technology - CIT 19, 2011, 4, 239 245 doi:10.2498/cit.1002030 239 A System for Publishing Sensor Data on the Semantic Web Alexandra Moraru 1, Carolina Fortuna 2 and

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Most of today s Web content is intended for the use of humans rather than machines. While searching documents on the Web using computers, human interpretation is required before

More information

OWL a glimpse. OWL a glimpse (2) requirements for ontology languages. requirements for ontology languages

OWL a glimpse. OWL a glimpse (2) requirements for ontology languages. requirements for ontology languages OWL a glimpse OWL Web Ontology Language describes classes, properties and relations among conceptual objects lecture 7: owl - introduction of#27# ece#720,#winter# 12# 2# of#27# OWL a glimpse (2) requirements

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2017/18 Vedran Sabol with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz December 11 th 2017 Overview What is Semantic

More information

Helmi Ben Hmida Hannover University, Germany

Helmi Ben Hmida Hannover University, Germany Helmi Ben Hmida Hannover University, Germany 1 Summarizing the Problem: Computers don t understand Meaning My mouse is broken. I need a new one 2 The Semantic Web Vision the idea of having data on the

More information

Semantic agents for location-aware service provisioning in mobile networks

Semantic agents for location-aware service provisioning in mobile networks Semantic agents for location-aware service provisioning in mobile networks Alisa Devlić University of Zagreb visiting doctoral student at Wireless@KTH September 9 th 2005. 1 Agenda Research motivation

More information

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES Christian de Sainte Marie ILOG Introduction We are interested in the topic of communicating policy decisions to other parties, and, more generally,

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2018/19 with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz January 7 th 2019 Overview What is Semantic Web? Technology

More information

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites Access IT Training 2003 Google indexed 3,3 billion of pages http://searchenginewatch.com/3071371 2005 Google s index contains 8,1 billion of websites http://blog.searchenginewatch.com/050517-075657 Estimated

More information

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Knowledge Representations. How else can we represent knowledge in addition to formal logic? Knowledge Representations How else can we represent knowledge in addition to formal logic? 1 Common Knowledge Representations Formal Logic Production Rules Semantic Nets Schemata and Frames 2 Production

More information

Lecture Telecooperation. D. Fensel Leopold-Franzens- Universität Innsbruck

Lecture Telecooperation. D. Fensel Leopold-Franzens- Universität Innsbruck Lecture Telecooperation D. Fensel Leopold-Franzens- Universität Innsbruck First Lecture: Introduction: Semantic Web & Ontology Introduction Semantic Web and Ontology Part I Introduction into the subject

More information

New Approach to Graph Databases

New Approach to Graph Databases Paper PP05 New Approach to Graph Databases Anna Berg, Capish, Malmö, Sweden Henrik Drews, Capish, Malmö, Sweden Catharina Dahlbo, Capish, Malmö, Sweden ABSTRACT Graph databases have, during the past few

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

Semantics. Matthew J. Graham CACR. Methods of Computational Science Caltech, 2011 May 10. matthew graham

Semantics. Matthew J. Graham CACR. Methods of Computational Science Caltech, 2011 May 10. matthew graham Semantics Matthew J. Graham CACR Methods of Computational Science Caltech, 2011 May 10 semantic web The future of the Internet (Web 3.0) Decentralized platform for distributed knowledge A web of databases

More information

Sensor Data Management

Sensor Data Management Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 8-14-2007 Sensor Data Management Cory Andrew Henson Wright State University

More information

The Semantic Sensor Network Ontology A Generic Language to Describe Sensor Assets

The Semantic Sensor Network Ontology A Generic Language to Describe Sensor Assets Ben Ridge Road Weather Station, South Esk River Catchment, Tasmania The Semantic Sensor Network Ontology A Generic Language to Describe Sensor Assets Holger Neuhaus Michael Compton Commonwealth Scientific

More information

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper Xiang Su and Jukka Riekki Intelligent Systems Group and Infotech Oulu, FIN-90014, University of Oulu, Finland {Xiang.Su,Jukka.Riekki}@ee.oulu.fi

More information

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information P. Smart, A.I. Abdelmoty and C.B. Jones School of Computer Science, Cardiff University, Cardiff,

More information

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services Contents G52IWS: The Semantic Web Chris Greenhalgh 2007-11-10 Introduction to the Semantic Web Semantic Web technologies Overview RDF OWL Semantic Web Services Concluding comments 1 See Developing Semantic

More information

Semantic Web Test

Semantic Web Test Semantic Web Test 24.01.2017 Group 1 No. A B C D 1 X X X 2 X X 3 X X 4 X X 5 X X 6 X X X X 7 X X 8 X X 9 X X X 10 X X X 11 X 12 X X X 13 X X 14 X X 15 X X 16 X X 17 X 18 X X 19 X 20 X X 1. Which statements

More information

Semantically enhancing SensorML with controlled vocabularies in the marine domain

Semantically enhancing SensorML with controlled vocabularies in the marine domain Semantically enhancing SensorML with controlled vocabularies in the marine domain KOKKINAKI ALEXANDRA, BUCK JUSTIN, DARROCH LOUISE, JIRKA SIMON AND THE MARINE PROFILES FOR OGC SENSOR WEB ENABLEMENT STANDARDS

More information

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata Meeting Host Supporting Partner Meeting Sponsors Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata 105th OGC Technical Committee Palmerston North, New Zealand Dr.

More information

A Review for Semantic Sensor Web Research and Applications

A Review for Semantic Sensor Web Research and Applications , pp.31-36 http://dx.doi.org/10.14257/astl.2014.48.06 A Review for Semantic Sensor Web Research and Applications Chaoqun Ji, Jin Liu, Xiaofeng Wang College of Information Engineering, Shanghai Maritime

More information

OWL 2 Update. Christine Golbreich

OWL 2 Update. Christine Golbreich OWL 2 Update Christine Golbreich 1 OWL 2 W3C OWL working group is developing OWL 2 see http://www.w3.org/2007/owl/wiki/ Extends OWL with a small but useful set of features Fully backwards

More information

SEXTANT 1. Purpose of the Application

SEXTANT 1. Purpose of the Application SEXTANT 1. Purpose of the Application Sextant has been used in the domains of Earth Observation and Environment by presenting its browsing and visualization capabilities using a number of link geospatial

More information

Novel System Architectures for Semantic Based Sensor Networks Integraion

Novel System Architectures for Semantic Based Sensor Networks Integraion Novel System Architectures for Semantic Based Sensor Networks Integraion Z O R A N B A B O V I C, Z B A B O V I C @ E T F. R S V E L J K O M I L U T N O V I C, V M @ E T F. R S T H E S C H O O L O F T

More information

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall The Semantic Web Revisited Nigel Shadbolt Tim Berners-Lee Wendy Hall Today sweb It is designed for human consumption Information retrieval is mainly supported by keyword-based search engines Some problems

More information

Extracting knowledge from Ontology using Jena for Semantic Web

Extracting knowledge from Ontology using Jena for Semantic Web Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan

More information

Semantic Web Programming

Semantic Web Programming *) Semantic Web Programming John Hebeler Matthew Fisher Ryan Blace Andrew Perez-Lopez WILEY Wiley Publishing, Inc. Contents Foreword Introduction xxiii xxv Part One Introducing Semantic Web Programming

More information

H1 Spring C. A service-oriented architecture is frequently deployed in practice without a service registry

H1 Spring C. A service-oriented architecture is frequently deployed in practice without a service registry 1. (12 points) Identify all of the following statements that are true about the basics of services. A. Screen scraping may not be effective for large desktops but works perfectly on mobile phones, because

More information

Orchestrating Music Queries via the Semantic Web

Orchestrating Music Queries via the Semantic Web Orchestrating Music Queries via the Semantic Web Milos Vukicevic, John Galletly American University in Bulgaria Blagoevgrad 2700 Bulgaria +359 73 888 466 milossmi@gmail.com, jgalletly@aubg.bg Abstract

More information

OWL 2 The Next Generation. Ian Horrocks Information Systems Group Oxford University Computing Laboratory

OWL 2 The Next Generation. Ian Horrocks Information Systems Group Oxford University Computing Laboratory OWL 2 The Next Generation Ian Horrocks Information Systems Group Oxford University Computing Laboratory What is an Ontology? What is an Ontology? A model of (some aspect

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

Semantic Web: vision and reality

Semantic Web: vision and reality Semantic Web: vision and reality Mile Jovanov, Marjan Gusev Institute of Informatics, FNSM, Gazi Baba b.b., 1000 Skopje {mile, marjan}@ii.edu.mk Abstract. Semantic Web is set of technologies currently

More information

Main topics: Presenter: Introduction to OWL Protégé, an ontology editor OWL 2 Semantic reasoner Summary TDT OWL

Main topics: Presenter: Introduction to OWL Protégé, an ontology editor OWL 2 Semantic reasoner Summary TDT OWL 1 TDT4215 Web Intelligence Main topics: Introduction to Web Ontology Language (OWL) Presenter: Stein L. Tomassen 2 Outline Introduction to OWL Protégé, an ontology editor OWL 2 Semantic reasoner Summary

More information

Standardization of Ontologies

Standardization of Ontologies Standardization of Ontologies Kore Nordmann TU Dortmund March 17, 2009 Outline History Related technologies Ontology development General history HTML UNTANGLE HTML 2.0 XML rec. XHTML RDF(S)

More information

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data: Introduction to Semantic Web Angelica Lo Duca IIT-CNR angelica.loduca@iit.cnr.it Linked Open Data: a paradigm for the Semantic Web Course Outline Introduction to SW Give a structure to data (RDF Data Model)

More information

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology Preface The idea of improving software quality through reuse is not new. After all, if software works and is needed, just reuse it. What is new and evolving is the idea of relative validation through testing

More information

JENA: A Java API for Ontology Management

JENA: A Java API for Ontology Management JENA: A Java API for Ontology Management Hari Rajagopal IBM Corporation Page Agenda Background Intro to JENA Case study Tools and methods Questions Page The State of the Web Today The web is more Syntactic

More information

DCMI Abstract Model - DRAFT Update

DCMI Abstract Model - DRAFT Update 1 of 7 9/19/2006 7:02 PM Architecture Working Group > AMDraftUpdate User UserPreferences Site Page Actions Search Title: Text: AttachFile DeletePage LikePages LocalSiteMap SpellCheck DCMI Abstract Model

More information

STS Infrastructural considerations. Christian Chiarcos

STS Infrastructural considerations. Christian Chiarcos STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)

More information

The Semantic Planetary Data System

The Semantic Planetary Data System The Semantic Planetary Data System J. Steven Hughes 1, Daniel J. Crichton 1, Sean Kelly 1, and Chris Mattmann 1 1 Jet Propulsion Laboratory 4800 Oak Grove Drive Pasadena, CA 91109 USA {steve.hughes, dan.crichton,

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Linking datasets with user commentary, annotations and publications: the CHARMe project

Linking datasets with user commentary, annotations and publications: the CHARMe project Linking datasets with user commentary, annotations and publications: the CHARMe project Jon Blower j.d.blower@reading.ac.uk University of Reading On behalf of all CHARMe partners! http://www.charme.org.uk

More information

Chapter 13: Advanced topic 3 Web 3.0

Chapter 13: Advanced topic 3 Web 3.0 Chapter 13: Advanced topic 3 Web 3.0 Contents Web 3.0 Metadata RDF SPARQL OWL Web 3.0 Web 1.0 Website publish information, user read it Ex: Web 2.0 User create content: post information, modify, delete

More information

Appendix 1. Description Logic Terminology

Appendix 1. Description Logic Terminology Appendix 1 Description Logic Terminology Franz Baader Abstract The purpose of this appendix is to introduce (in a compact manner) the syntax and semantics of the most prominent DLs occurring in this handbook.

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Context-aware Semantic Middleware Solutions for Pervasive Applications

Context-aware Semantic Middleware Solutions for Pervasive Applications Solutions for Pervasive Applications Alessandra Toninelli alessandra.toninelli@unibo.it Università degli Studi di Bologna Department of Electronics, Information and Systems PhD Course Infrastructure and

More information

Appendix 1. Description Logic Terminology

Appendix 1. Description Logic Terminology Appendix 1 Description Logic Terminology Franz Baader Abstract The purpose of this appendix is to introduce (in a compact manner) the syntax and semantics of the most prominent DLs occurring in this handbook.

More information

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

Agent-oriented Semantic Discovery and Matchmaking of Web Services

Agent-oriented Semantic Discovery and Matchmaking of Web Services Agent-oriented Semantic Discovery and Matchmaking of Web Services Ivan Mećar 1, Alisa Devlić 1, Krunoslav Tržec 2 1 University of Zagreb Faculty of Electrical Engineering and Computing Department of Telecommunications

More information

Mapping between Digital Identity Ontologies through SISM

Mapping between Digital Identity Ontologies through SISM Mapping between Digital Identity Ontologies through SISM Matthew Rowe The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK m.rowe@dcs.shef.ac.uk

More information

Grid Resources Search Engine based on Ontology

Grid Resources Search Engine based on Ontology based on Ontology 12 E-mail: emiao_beyond@163.com Yang Li 3 E-mail: miipl606@163.com Weiguang Xu E-mail: miipl606@163.com Jiabao Wang E-mail: miipl606@163.com Lei Song E-mail: songlei@nudt.edu.cn Jiang

More information

DCO: A Mid Level Generic Data Collection Ontology

DCO: A Mid Level Generic Data Collection Ontology DCO: A Mid Level Generic Data Collection Ontology by Joel Cummings A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science

More information

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications 24Am Smart Open Services for European Patients Open ehealth initiative for a European large scale pilot of Patient Summary and Electronic Prescription Work Package 3.5 Semantic Services Definition Appendix

More information

X-KIF New Knowledge Modeling Language

X-KIF New Knowledge Modeling Language Proceedings of I-MEDIA 07 and I-SEMANTICS 07 Graz, Austria, September 5-7, 2007 X-KIF New Knowledge Modeling Language Michal Ševčenko (Czech Technical University in Prague sevcenko@vc.cvut.cz) Abstract:

More information

Service Integration - A Web of Things Perspective W3C Workshop on Data and Services Integration

Service Integration - A Web of Things Perspective W3C Workshop on Data and Services Integration Service Integration - A Web of Things Perspective W3C Workshop on Data and Services Integration Simon Mayer Institute for Pervasive Computing ETH Zurich, Switzerland simon.mayer@inf.ethz.ch The augmentation

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

model (ontology) and every DRS and CMS server has a well-known address (IP and port).

model (ontology) and every DRS and CMS server has a well-known address (IP and port). 7 Implementation In this chapter we describe the Decentralized Reasoning Service (DRS), a prototype service implementation that performs the cooperative reasoning process presented before. We present also

More information

A Comprehensive Sensor Taxonomy and Semantic Knowledge Representation

A Comprehensive Sensor Taxonomy and Semantic Knowledge Representation A Comprehensive Sensor Taxonomy and Semantic Knowledge Representation Energy Meter use Case Ranjan Dasgupta Innovation Lab Tata Consultancy Services Kolkata, India ranjan.dasgupta@tcs.com Sounak Dey Innovation

More information

Web Ontology for Software Package Management

Web Ontology for Software Package Management Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 331 338. Web Ontology for Software Package Management Péter Jeszenszky Debreceni

More information

Network protocols and. network systems INTRODUCTION CHAPTER

Network protocols and. network systems INTRODUCTION CHAPTER CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more

More information

l A family of logic based KR formalisms l Distinguished by: l Decidable fragments of FOL l Closely related to Propositional Modal & Dynamic Logics

l A family of logic based KR formalisms l Distinguished by: l Decidable fragments of FOL l Closely related to Propositional Modal & Dynamic Logics What Are Description Logics? Description Logics l A family of logic based KR formalisms Descendants of semantic networks and KL-ONE Describe domain in terms of concepts (classes), roles (relationships)

More information

SKOS. COMP62342 Sean Bechhofer

SKOS. COMP62342 Sean Bechhofer SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Ontologies Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

Presented By Aditya R Joshi Neha Purohit

Presented By Aditya R Joshi Neha Purohit Presented By Aditya R Joshi Neha Purohit Pellet What is Pellet? Pellet is an OWL- DL reasoner Supports nearly all of OWL 1 and OWL 2 Sound and complete reasoner Written in Java and available from http://

More information

Semantic Web. Tahani Aljehani

Semantic Web. Tahani Aljehani Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,

More information

An Ontology-Based Methodology for Integrating i* Variants

An Ontology-Based Methodology for Integrating i* Variants An Ontology-Based Methodology for Integrating i* Variants Karen Najera 1,2, Alicia Martinez 2, Anna Perini 3, and Hugo Estrada 1,2 1 Fund of Information and Documentation for the Industry, Mexico D.F,

More information

D WSMO Data Grounding Component

D WSMO Data Grounding Component Project Number: 215219 Project Acronym: SOA4All Project Title: Instrument: Thematic Priority: Service Oriented Architectures for All Integrated Project Information and Communication Technologies Activity

More information

Proof-of-Concept Evaluation for Modelling Time and Space. Zaenal Akbar

Proof-of-Concept Evaluation for Modelling Time and Space. Zaenal Akbar Proof-of-Concept Evaluation for Modelling Time and Space (PlanetData Deliverable 2.5) Zaenal Akbar December 14, 2015 Copyright 2015 STI INNSBRUCK www.sti-innsbruck.at Outline Introduction Product: Map4RDF-iOS

More information

The Open Group SOA Ontology Technical Standard. Clive Hatton

The Open Group SOA Ontology Technical Standard. Clive Hatton The Open Group SOA Ontology Technical Standard Clive Hatton The Open Group Releases SOA Ontology Standard To Increase SOA Adoption and Success Rates Ontology Fosters Common Understanding of SOA Concepts

More information

Publishing Linked Statistical Data: Aragón, a case study.

Publishing Linked Statistical Data: Aragón, a case study. Publishing Linked Statistical Data: Aragón, a case study. Oscar Corcho 1, Idafen Santana-Pérez 1, Hugo Lafuente 2, David Portolés 3, César Cano 4, Alfredo Peris 4, and José María Subero 4 1 Ontology Engineering

More information

Ontologies SKOS. COMP62342 Sean Bechhofer

Ontologies SKOS. COMP62342 Sean Bechhofer Ontologies SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

Lightweight Semantic Web Motivated Reasoning in Prolog

Lightweight Semantic Web Motivated Reasoning in Prolog Lightweight Semantic Web Motivated Reasoning in Prolog Salman Elahi, s0459408@sms.ed.ac.uk Supervisor: Dr. Dave Robertson Introduction: As the Semantic Web is, currently, in its developmental phase, different

More information

INCORPORATING A SEMANTICALLY ENRICHED NAVIGATION LAYER ONTO AN RDF METADATABASE

INCORPORATING A SEMANTICALLY ENRICHED NAVIGATION LAYER ONTO AN RDF METADATABASE Teresa Susana Mendes Pereira & Ana Alice Batista INCORPORATING A SEMANTICALLY ENRICHED NAVIGATION LAYER ONTO AN RDF METADATABASE TERESA SUSANA MENDES PEREIRA; ANA ALICE BAPTISTA Universidade do Minho Campus

More information

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø Towards the Semantic Desktop Dr. Øyvind Hanssen University Library of Tromsø Agenda Background Enabling trends and technologies Desktop computing and The Semantic Web Online Social Networking and P2P Computing

More information

HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH, 2017

HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH, 2017 HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH, 2017 EXAMPLES OF DEP DATA AND CROWDSOURCING Storm Readiness Beach Assessments Park Closings Emergency Management Social

More information

Semantic-Based Web Mining Under the Framework of Agent

Semantic-Based Web Mining Under the Framework of Agent Semantic-Based Web Mining Under the Framework of Agent Usha Venna K Syama Sundara Rao Abstract To make automatic service discovery possible, we need to add semantics to the Web service. A semantic-based

More information

Chapter 2 SEMANTIC WEB. 2.1 Introduction

Chapter 2 SEMANTIC WEB. 2.1 Introduction Chapter 2 SEMANTIC WEB 2.1 Introduction The term Semantic refers to a sequence of symbols that can be used to communicate meaning and this communication can then affect behavior in different situations.

More information

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation

More information

A General Approach to Query the Web of Data

A General Approach to Query the Web of Data A General Approach to Query the Web of Data Xin Liu 1 Department of Information Science and Engineering, University of Trento, Trento, Italy liu@disi.unitn.it Abstract. With the development of the Semantic

More information

Semantiska webben DFS/Gbg

Semantiska webben DFS/Gbg 1 Semantiska webben 2010 DFS/Gbg 100112 Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS) With thanks to Ivan for many slides 2 Trends and forces: Technology Internet

More information

Linked Open Data: a short introduction

Linked Open Data: a short introduction International Workshop Linked Open Data & the Jewish Cultural Heritage Rome, 20 th January 2015 Linked Open Data: a short introduction Oreste Signore (W3C Italy) Slides at: http://www.w3c.it/talks/2015/lodjch/

More information

Ontological Modeling: Part 2

Ontological Modeling: Part 2 Ontological Modeling: Part 2 Terry Halpin LogicBlox This is the second in a series of articles on ontology-based approaches to modeling. The main focus is on popular ontology languages proposed for the

More information

CSc 8711 Report: OWL API

CSc 8711 Report: OWL API CSc 8711 Report: OWL API Syed Haque Department of Computer Science Georgia State University Atlanta, Georgia 30303 Email: shaque4@student.gsu.edu Abstract: The Semantic Web is an extension of human-readable

More information

An Architecture for Semantic Enterprise Application Integration Standards

An Architecture for Semantic Enterprise Application Integration Standards An Architecture for Semantic Enterprise Application Integration Standards Nenad Anicic 1, 2, Nenad Ivezic 1, Albert Jones 1 1 National Institute of Standards and Technology, 100 Bureau Drive Gaithersburg,

More information

Information Retrieval (IR) through Semantic Web (SW): An Overview

Information Retrieval (IR) through Semantic Web (SW): An Overview Information Retrieval (IR) through Semantic Web (SW): An Overview Gagandeep Singh 1, Vishal Jain 2 1 B.Tech (CSE) VI Sem, GuruTegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi 2

More information

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data Ahsan Morshed Intelligent Sensing and Systems Laboratory, CSIRO, Hobart, Australia {ahsan.morshed, ritaban.dutta}@csiro.au

More information

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García A Formal Definition of RESTful Semantic Web Services Antonio Garrote Hernández María N. Moreno García Outline Motivation Resources and Triple Spaces Resources and Processes RESTful Semantic Resources Example

More information

Executing Evaluations over Semantic Technologies using the SEALS Platform

Executing Evaluations over Semantic Technologies using the SEALS Platform Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.

More information

A Knowledge-Based System for the Specification of Variables in Clinical Trials

A Knowledge-Based System for the Specification of Variables in Clinical Trials A Knowledge-Based System for the Specification of Variables in Clinical Trials Matthias Löbe, Barbara Strotmann, Kai-Uwe Hoop, Roland Mücke Institute for Medical Informatics, Statistics and Epidemiology

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

RDF /RDF-S Providing Framework Support to OWL Ontologies

RDF /RDF-S Providing Framework Support to OWL Ontologies RDF /RDF-S Providing Framework Support to OWL Ontologies Rajiv Pandey #, Dr.Sanjay Dwivedi * # Amity Institute of information Technology, Amity University Lucknow,India * Dept.Of Computer Science,BBA University

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016] Event Stores (I) Event stores are database management systems implementing the concept of event sourcing. They keep all state changing events for an object together with a timestamp, thereby creating a

More information

Table of Contents. iii

Table of Contents. iii Current Web 1 1.1 Current Web History 1 1.2 Current Web Characteristics 2 1.2.1 Current Web Features 2 1.2.2 Current Web Benefits 3 1.2.3. Current Web Applications 3 1.3 Why the Current Web is not Enough

More information

SC32 WG2 Metadata Standards Tutorial

SC32 WG2 Metadata Standards Tutorial SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China WG2 Viewpoint Big Data magnifies the existing challenges and issues of managing and interpreting

More information

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication Citation for published version: Patel, M & Duke, M 2004, 'Knowledge Discovery in an Agents Environment' Paper presented at European Semantic Web Symposium 2004, Heraklion, Crete, UK United Kingdom, 9/05/04-11/05/04,.

More information