Keeping Track of the Semantic Web: Personalized Event Notification

Similar documents
Racer: An OWL Reasoning Agent for the Semantic Web

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information

Annotation for the Semantic Web During Website Development

Adding formal semantics to the Web

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

An RDF Storage and Query Framework with Flexible Inference Strategy

OSM Lecture (14:45-16:15) Takahira Yamaguchi. OSM Exercise (16:30-18:00) Susumu Tamagawa

Opus: University of Bath Online Publication Store

Chapter 13: Advanced topic 3 Web 3.0

An Introduction to the Semantic Web. Jeff Heflin Lehigh University

Semantic Web Technologies

The Semantic Web. What is the Semantic Web?

Falcon-AO: Aligning Ontologies with Falcon

A Tool for Storing OWL Using Database Technology

Helmi Ben Hmida Hannover University, Germany

Design and Implementation of an RDF Triple Store

A survey of ontology based databases

KawaWiki: A Semantic Wiki Based on RDF Templates

Outline RDF. RDF Schema (RDFS) RDF Storing. Semantic Web and Metadata What is RDF and what is not? Why use RDF? RDF Elements

Development of an Ontology-Based Portal for Digital Archive Services

CHAPTER 1 INTRODUCTION

Semantic Web In Depth: Resource Description Framework. Dr Nicholas Gibbins 32/4037

The Semantic Planetary Data System

From the Web to the Semantic Web: RDF and RDF Schema

Logical reconstruction of RDF and ontology languages

Semantic Exploitation of Engineering Models: An Application to Oilfield Models

MoRe Semantic Web Applications

The Semantic Web. Mansooreh Jalalyazdi

CWI. Multimedia on the Semantic Web. Jacco van Ossenbruggen, Lynda Hardman, Frank Nack. Multimedia and Human-Computer Interaction CWI, Amsterdam

Interoperability of Protégé using RDF(S) as Interchange Language

Knowledge Representation, Ontologies, and the Semantic Web

Easing the Definition of N Ary Relations for Supporting Spatio Temporal Models in OWL

An RDF NetAPI. Andy Seaborne. Hewlett-Packard Laboratories, Bristol

Mustafa Jarrar: Lecture Notes on RDF Schema Birzeit University, Version 3. RDFS RDF Schema. Mustafa Jarrar. Birzeit University

Semantic Web Fundamentals

Ontology Exemplification for aspocms in the Semantic Web

Ontology Modeling and Storage System for Robot Context Understanding

Grounding OWL-S in SAWSDL

New Approach to Graph Databases

New Tools for the Semantic Web

Semantic Web Lecture Part 4. Prof. Do van Thanh

A Tagging Approach to Ontology Mapping

A General Approach to Query the Web of Data

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

The Semantic Web: A Vision or a Dream?

Reasoning on Business Processes and Ontologies in a Logic Programming Environment

Automating Instance Migration in Response to Ontology Evolution

Using RDF to Model the Structure and Process of Systems

Semantic Web and Linked Data

Analysing Web Ontology in Alloy: A Military Case Study

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

INF3580/4580 Semantic Technologies Spring 2015

Simplified RDF Syntax for Power System Model Exchange

RDF Schema. Mario Arrigoni Neri

Information Retrieval (IR) through Semantic Web (SW): An Overview

Table of Contents. iii

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Semantic Knowledge Management: The Marriage Of Semantic Web And Knowledge Management

Extracting knowledge from Ontology using Jena for Semantic Web

XML ALONE IS NOT SUFFICIENT FOR EFFECTIVE WEBEDI

Flexible Collaboration over XML Documents

Mapping between Digital Identity Ontologies through SISM

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Flat triples approach to RDF graphs in JSON

RDF /RDF-S Providing Framework Support to OWL Ontologies

Finding Similarity and Comparability from Merged Hetero Data of the Semantic Web by Using Graph Pattern Matching

On Querying Ontologies with Contextual Logic Programming

Semantic web. Tapas Kumar Mishra 11CS60R32

Proposal for Implementing Linked Open Data on Libraries Catalogue

Intelligent Brokering of Environmental Information with the BUSTER System

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG

Semantic Web Tools. Federico Chesani 18 Febbraio 2010

Semistructured Data Management Part 3 (Towards the) Semantic Web

RDF Graph Data Model

Generating and Managing Metadata for Web-Based Information Systems

Semantics-Aware Querying of Web-Distributed RDF(S) Repositories

COMP9321 Web Application Engineering

Comparing path-based and vertically-partitioned RDF databases

COMP9321 Web Application Engineering

Ontological Modeling: Part 2

TrOWL: Tractable OWL 2 Reasoning Infrastructure

RStar: An RDF Storage and Query System for Enterprise Resource Management

Orchestrating Music Queries via the Semantic Web

OWL a glimpse. OWL a glimpse (2) requirements for ontology languages. requirements for ontology languages

Semantic Web Engineering

Semantics. Matthew J. Graham CACR. Methods of Computational Science Caltech, 2011 May 10. matthew graham

An RDF Model for Multi-Level Hypertext in Digital Libraries

Persistent Storage System for Efficient Management of OWL Web Ontology *

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

SWAD-Europe Deliverable 8.1 Core RDF Vocabularies for Thesauri

Introduction to Ontologies

Device Independent Principles for Adapted Content Delivery

SEMANTIC WEB LANGUAGES STRENGTHS AND WEAKNESS

Semantic Web Fundamentals

SPARQL Back-end for Contextual Logic Agents

2. RDF Semantic Web Basics Semantic Web

Ontology-Based Schema Integration

Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications

Transcription:

Keeping Track of the Semantic Web: Personalized Event Notification Annika Hinze and Reuben Evans University of Waikato, New Zealand {hinze, rjee1}@cs.waikato.ac.nz Abstract. The semantic web will not be a static collection of formats, data and meta-data but highly dynamic in each aspect. This paper proposes a personalized event notification system for semantic web documents (ENS-SW). The system can intelligently detect and filter changes in semantic web documents by exploiting the semantic structure of those documents. In our prototype, we combine the functionalities of user profiles and distributed authoring systems. Typically, both approaches would lack the ability to handle semantic web documents. This paper introduces the design and implementation of our event notification system for semantic web documents that handles the XML representation of RDF. We analyzed our prototype regarding accuracy and efficiency in change detection. Our system supports sophisticated change detection including partial deletion, awareness for document restructuring, and approximate filter matches. 1 Introduction In this project, we address the problem of alerting users of changes in semantic web documents. Some work on change detection in the semantic web (SW) has already been done; most of the projects focus on ontologies (see, Qin and Atluri, 2004). Here, we focus on changes in documents containing data or metadata. A system for detecting, filtering and notifying about events is called an event notification system (ENS). We identified two possible approaches to our problem: either to extend a SW system with ENS functionality or to add SW support to an existing ENS system. We focused on the latter approach, extending a proven event notification system, and combining it with a well-accepted distributed authoring system to handle semantic web documents. Typically, both types of systems lack the ability to handle semantic web documents. In this paper, we introduce concept, implementation, and evaluation of the proposed system ENS-SW. The details and challenges of our project are discussed after an introduction to background knowledge about Semantic Web technologies. Section 2 provides a description of the document formats RDF and RDFS, which are used in semantic web models. We identified several challenges for detecting changes or updates in these models. In Section 3, the detailed focus of our project is defined. In Section 4, we discuss the conceptual design of our system. Section 5 describes the prototype R. Meersman, Z. Tari et al. (Eds.): OTM 2006, LNCS 4275, pp. 661 678, 2006. Springer-Verlag Berlin Heidelberg 2006

662 A. Hinze and R. Evans implemented. Section 6 presents the evaluation to test the performance of the system under significant load. Section 7 outlines our conclusions from this work and identifies areas for future work. 2 Background and Project Focus This section describes the context of the study and introduces the main concepts. 2.1 Brief Introduction to Semantic Web and RDF/S The semantic web is an extension of the current Internet (Berners-Lee et al., 2001). Information contained in semantic web documents is enhanced by semantic annotations. Agents and services will give access to these data (metadata and knowledge). Three complementary components, the Resource Description Framework (RDF), RDF Schema (RDFS) and Ontologies are the implementation level methods of representing metadata and its related knowledge representation within the semantic web. RDF and RDF Schema In order to have unique semantic annotations for representing items in the semantic web, it is not sufficient, or desirable, to have a single definition for all the semantic concepts needed. The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web (Beckett, 2004). RDF has a triple structure that combines three resources: subject, property, and object. Several RDF triples can be combined to form RDF networks; a network can be defined in one or several documents. These RDF networks can be represented in a number of ways, for example by XML, Graph or in N3 notation. The example in Figure 1 shows a simple RDF relationship of a course to its lecture notes expressed in each of the three formats. As a Graph: Using N3 notation: :Course a rdf:class. :Notes a rdf:class. :has a rdf:property. :Course :has :Notes. As XML: <rdf:rdf> <rdf:property rdf:about= #has /> <rdf:description rdf:about= #Course > <:has rdf:resource= #Notes /> </rdf:description> <rdf:description rdf:about="#notes"/> </rdf:rdf> Fig. 1. The three major ways of representing RDF structures RDF provides the structure for the metadata, but is not sufficient to concisely describe complex semantic relationships. RDF Schema (RDFS) deals with knowledge representation in RDF. RDFS (Brickley and Guha, 2004) adds to RDF the ability to specify classes and more restrictive relationships between the resources described in the document. For example, we could use RDFS to extend our example from

Keeping Track of the Semantic Web: Personalized Event Notification 663 A Graph: As XML: <rdf:rdf> <rdf:property rdf:about= #has /> <rdfs:class rdf:about= #Course > <:has rdf:resource= #Notes > </rdfs:class> <rdfs:class rdf:about="#notes"> <rdfs:subclassof= #Documents > </rdfs:class> <rdfs:class rdf:about= #Documents /> </rdf:rdf> Fig. 2. Extension of the example in Figure 1 to use RDFS Figure 1, so that we can record not only that lecture notes relate to a course but that lecture notes are documents. An example of an RDF network using RDF Schema is shown in Figure 2 above. By converting our existing type descriptions to RDFS classes we turn our example into a schema, which, in turn, can be used to define instances of type Course and Notes. Ontologies One of the key requirements of the semantic web is the common definition of terms and concepts among agents. An ontology bridges the gap (Heflin, 2004) by describing both what an identifier means and how that identifier relates to other identifiers. An ontology can further provide rules for reasoning relationships between resources that are not explicitly linked by RDF triples but which are implied by those triples. RDF Schema defines basic ontological modelling primitives on top of RDF, e.g., the concept of a subclass, domain and range for properties. Other semantic web languages with richer modelling primitives, such as disjoints and rules, can be constructed by extending RDF Schema. Examples are DAML+OIL (Connolly et al., 2001) and OWL (Patel-Schneider et al., 2004). Here, we focus on documents using RDF and RDF Schema. Our system could be easily adapted for similar SW languages. 2.2 Problem Description We will use the example of a semantic network for teaching-related documents, to illustrate the problem addressed in this paper. Figure 3 shows an RDF/RDFS network representing the data instances in the lower part of the schema (RDF) and the conceptual relationships between lecturers, courses and lecture notes in the upper part of the schema (RDFS in dashed lines). The network given here can be seen as a sub-section of the network that could describe all the courses and lecturers at the university. A user may search the XML representation of this network (e.g. by querying the XML document shown in Figure 4) to retrieve the data. This is satisfactory as long as the semantic data and the underlying schema remain unchanged. Difficulties would arise if another user changes the information (e.g., about a lecturer of a particular course) or changes the structure (e.g., lecture notes are no longer tied to a course in general but to a specific semester). The user may want to know that their retrieved

664 A. Hinze and R. Evans Fig. 3. A diagram of an RDF/RDFS network <rdf:rdf xmlns=http://isdb.waikato.ac.nz/nonexistant/schemadoc# xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema# xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#> <rdfs:class rdf:about="#person" /> <rdfs:class rdf:about="#lecturer"> <rdfs:subclassof rdf:resource="#person" /> </rdfs:class> <rdfs:class rdf:about="#course" /> <rdfs:class rdf:about="#document" /> <rdfs:class rdf:about="#lecturenotes"> <rdfs:subclassof rdf:resource="#document" /> </rdfs:class> <rdf:property rdf:about="#fname" /> <rdf:property rdf:about="#lname" /> <rdf:property rdf:about="#name" /> <Lecturer rdf:about="#annika"> <fname>annika</fname> <lname>hinze</lname> </Lecturer> </rdf:rdf> Fig. 4. A portion of Figure 3 expressed in RDF s XML representation result may have become invalid. Consequently, search is an insufficient mechanism to deal with changes in the data or data structure. This problem becomes more pronounced if the schema is used as a global standard by universities around the world to categorise their lecture material. Then, the task of updating the network (with all implications) becomes complex. In addition, external users could use the data as a component in their semantic web documents. From this example scenario, the following questions arise:

Keeping Track of the Semantic Web: Personalized Event Notification 665 1. How will user find out when the data they are using has changed? 2. How does the owner of a network know who needs to be notified about a change in their network? 3. How does a user find out when a part of a network to which they refer in their network is changed? 4. How does a user identify what needs to be changed in their network in response to a notification they have received about changes in a network to which they refer? There are complex issues behind each of these questions. In this paper, we address the first two questions as a starting point of the larger problem outlined above. Our aim is to detect changes in semantic data and process such changes so as to notify the users or agents that have expressed an interest these changes. Consequently, the goal is to design and implement an event notification system for semantic web documents that exploits information about the semantic structure of the documents for filtering. 2.3 Principles of Event Notification Users can define their interests in certain events, such as a change in a document, in profiles. An event notification system matches observed events to profiles in a process called filtering. In the context of the semantic web, events are changes to the RDF files (e.g., new, changed, deleted). Typically, events have to be sent to the Event Notification System (ENS) by the producer, see Figure 5. They are filtered individually against an index of user profiles. Profiles are similar to continuous queries; they are created by users of the ENS to specify their area of interest. An ENS indexes profile queries, not documents. When the system receives a message about an event, it filters the message against the stored profiles and where a match is found, it notifies the owner of that profile about the event. Fig. 5. User interaction with an event notification system For our problem, new or changed RDF documents have to be submitted to the system to be matched against the profiles and then to be sent out to interested users. Figure 6 shows the internal components of the system: The RDF documents are processed by the observer which isolates the events within the documents and passes them to the filter. The filter compares the event messages to the stored profiles. Whenever it detects a match, it will pass the event and the profile to the notifier. Fig. 6. Internal components of an event notification system

666 A. Hinze and R. Evans Proposed Solution Typically, ENS neither support event observation in documents nor semantic web specific features. Our approach combines an event notification system with a distributed authoring system and adapts both components to adapt for application in a semantic web context. The system needs to support a profile language that can express RDF constructs. One may extend XML or RDF query languages to cover events, such as XML-QL (Deutsch et al. 1999) or RQL (Karvounarakis et al. 2002). As these components need to be tightly integrated, the system will include the profile store and notifier as well as the filter. The observer component has requirements that differ from the standard ENS. In Figure 6, we see the events coming in from the producers. However, the producers in this situation provide edited documents where very few of the triples have changed. They do not provide explicit event messages but whole documents. Accordingly, the observer component needs to be an active observer comparing new versions of semantic web documents that it receives against previous states of those documents. It will forward information about the actual changes, additions, and deletions that occur between one document version and the next. Accordingly, we chose to use a distributed authoring system that supports change detection and WebDAV (Whitehead and Goland, 1999). WebDAV is an extension of the http protocol that will allow the producers to easily supply their documents to the system. 4 Conceptual Design Addressing research questions 1 and 2, we focus here on the design of a local system, which can later be extended as a broker component in a distributed system. Our ENS- SW system consists of five main components as seen in Figure 7: a repository for storing the RDF data, an observer to isolate changes in the RDF data, a store and index for profiles, a filter to identify matches between those changes and the collection of profiles, and a notifier to send out the notifications for profiles that have been matched. We now briefly describe each of the components. 4.1 Repository Fig. 7. Concept Architecture of ENS-SW The repository designed either as active repository that includes some or all of the observer functionality or as passive repository that must be monitored and polled by an observer. We will use an active repository for which we identify three key requirements:

Keeping Track of the Semantic Web: Personalized Event Notification 667 1. The repository needed means to represent change. Various forms can be envisioned, such as changes that are committed to a database in a transaction, or versioning through the use of diff files as in CVS. This will not be sufficient to detect all details of the events. Instead it will act as the trigger mechanism that alerts the event detector that some changes may have occurred. The trigger would also give an indication of where the potential changes have occurred. 2. The repository was required to represent data in the chosen format. For our design, we decided to consider the XML representations of RDF/S documents (see Figure 4). XML filter languages are sufficiently and prior approaches using RDF triple representations have shown the limitations of those languages. Consequently, the repository needs to support storage of XML data. It should also be able to indicate those documents that might have changes in XML format to the change detector. This requirement is fundamental due to our design decision to work exclusively with the XML representation of RDF. 3. The final requirement for the repository was that it should be accessible via the Internet to be included in larger semantic networks. This would also allow for easier system deployment. We identified WebDAV as the protocol that the system should support for document updating; WebDAV provides a powerful yet simple way for agents to access the repository. 4.2 Observer The observer has complex functionality; we decided to design it as separate component. However, the observer needs to be tightly coupled with the repository. When presented with a file in which some form of change may have occurred, the observer has to (1) detect the change and (2) determine which effect the change in the XML document has on the structure of the RDF network that is described by the XML. This problem is not trivial, as there are a number of different changes that can occur in a file that would not necessarily change the underlying RDF network. The reasons for this lie, amongst others, in the wide variety of ways in which a particular RDF network can be represented in XML. The syntax is verbose, allows considerable redundancy and does take into account the order of attributes. This means that there may be significant changes to a file without any change to the RDF network. In addition, simple syntactical changes, such as blanks, should be discarded. These constraints make it difficult to use many of the existing XML diff algorithms. The observer component must be able to detect the following types of events: Modification: change to existing triples Insertion: creation of new triples Deletion: removal of existing triples Modification events are the simplest ones; they can have only two different forms of a class substitution or literal value change. Class substitution occurs when the object of a triple is changed to some other resource within the RDF structure. Literal value changes involve the substitution of one metadata literal for another. For example, changing the name property of the resource COMP582 (see Figure 3) from Topics in Information Systems to Event-Based Systems. Insertions and deletions both follow the structure as shown in Figure 8. For deletions, we can see the limitations of relying

668 A. Hinze and R. Evans simple diff programs to obtain the event information: The available information is insufficient to detect the deletion pattern. In deletions, there are eight different event patterns. Multiple change events, each with a different pattern, could be caused by a single actual deletion within the RDF document, some of which may or may not be apparent from the event. Changes to the data may be inconsistent. Thus, deciding when to notify of a deletion involves more processing than just considering the individual resources that have been removed from the network. Consider the XML fragment in Figure 9, which defines four triples, on lines six, seven, nine, and ten. Using this as the base XML fragment, each of the eight deletion patterns is illustrated in Figure 10. Fig. 8. The eight patterns of change Insertions follow the same patterns as deletions but in reverse. Detecting the number of new triples is not a straightforward operation but involves considerable processing of the surrounding XML. This problem could be reduced if the document editor would check that all references were updated when changing a document. However, this is not always feasible because of the large multifile semantic networks that are used to describe and assist with real world problems. 1: <?xml version="1.0"?> 2: <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3: xmlns:dc="http://purl.org/dc/elements/1.1/" 4: xmlns="http://www.example.com/stuff#" 5: xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> 6: <rdfs:class rdf:about="subject"> 7: <property rdf:resource="object"/> 8: </rdfs:class> 9: <rdfs:class rdf:about="object" /> 10: <rdf:property rdf:about= http://www.example.com/stuff#property /> 11:</rdf:RDF> Fig. 9. Example RDF document 4.3 Profile Store/Index For the profile store and profile index, two critical factors exist: efficiency and ease of filtering. Both are served by the use of a tree storage structure which minimises the number of profile nodes that need to be visited by the filter for each event and simultaneously reduces the number of nodes that need to be remembered for each profile. 4.4 Filter The primary action of the filter is the matching of event message to stored profiles. Ideally, the filter would use the changed RDF triples identified by the event

Keeping Track of the Semantic Web: Personalized Event Notification 669 6: <rdfs:class rdf:about="subject"> 8: </rdfs:class> 9: <rdfs:class rdf:about="object" /> Removed lines: 7 & 10 Triples removed: 2 Deletion type a. 9: <rdfs:class rdf:about="object" /> Removed lines: 6-8 & 10 Triples removed 3 Deletion type b. 6: <rdfs:class rdf:about="subject"> 8: </rdfs:class> 6: <rdfs:class rdf:about="subject"> 8: </rdfs:class> 10: <rdf:property rdf:about=http://www.example.com/stuff#property /> 9: <rdfs:class rdf:about="object" /> 10: <rdf:property rdf:about=http://www.example.com/stuff#property /> 10: <rdf:property rdf:about=http://www.example.com/stuff#property /> 6: <rdfs:class rdf:about="subject"> 8: </rdfs:class> 9: <rdfs:class rdf:about="object" /> 10: <rdf:property rdf:about=http://www.example.com/stuff#property /> Fig. 10. Deletion patterns in XML Removed lines: 7,9 & 10 Triples removed: 3 Deletion type c. Removed lines: all Triples removed: 4 Deletion type d. Removed lines: 7 & 9 Triples removed: 2 Deletion Type e Removed lines: 6-8 Triples removed: 2 Deletion Type f Removed lines: 6-9 Triples removed: 2 Deletion Type g Removed line: 7 Triples removed: 1 Deletion Type h observer/detector and map those to the registered profiles. The filter would need to handle XML namespaces and attributes. RDF Syntax encodes almost all significant features in these two forms; element tags and text nodes are rarely used. In addition, it is important that the profile language supports profiles that distinguish particular change patterns, as outlined before. Another question is the scope of the detected changes: When matching an event to a given set of profiles, should a profile that registers an interest in change in a given resource be notified on any change to a triple involving that resource or only when the resource itself is changed? 4.5 Notifier The Notifier is the least complex component of the system. It sends notifications to users that own matched profiles. It should support a variety of means for notification to cater for different types of users: sending emails, RSS feeds, or making the notification available through a web site. Sending an email is good for occasional notifications to human users; a website or RSS feed would involve storing the notification on disk and retrieving it later when the feed was requested or the user logged into the site. Automatic messages via a given interface could cater for automatic/agent clients. 5 Implementation of ENS-SW This section describes the implementation details of our XML-based ENS for the Semantic Web called ENS-SW. The repository and the observer are based on the authoring system SVN, which provides the necessary data storage functions and basic

670 A. Hinze and R. Evans change detection. Subversion (SVN) is a relative of CVS (Collins-Sussman et al., 2004). It supports distributed authoring of documents. The detailed event detection is performed by a Delta Adapter component that we developed. It receives documents from the SVN and performs an XML diff to locate the relevant changes. It analyses the XML differences in the document before and after the changes were made. For the profile store and filter components, we extended the XML-based ApproXFilter (Michel and Hinze, 2005) to process RDF data. Our system exploits knowledge of the structure of the RDF Schema to create useful notifications to interested users. The ENS-SW can be accessed using the WebDAV protocol. The system consists of three components as shown in Figure 11: a repository (SVN), an observer that is tightly coupled to the repository (Delta Adapter), and an event filter (ApproxFilter). Whenever SVN receives a new or changed document it activates the adapter program by the use of a hook script. This program obtains a list of all changes to the repository since the latest revision. In this list, the adapter identifies the RDF files that changed in this revision and obtains both the version before and after the revision (see example in Figure 12). Fig. 11. Components of the ENS-SW system and their interactions The Adapter accesses copies of all changed RDF files (before and after the change) and uses DeltaXML libraries (La Fontaine, 2001) for the comparison. DeltaXML performs an unordered Tree-based diff. It records for each sub tree the insertions, changes, deletions or if the node and all its children remain the unchanged. The resulting mark-up that DeltaXML in the XML file can be seen in Figure 13. Although Delta XML identifies the changes in the XML file, it is not sufficient for identifying RDF events. Triples are interconnected; a change on one part of the network may affect other resources. In RDF, unlike in XML, a class and its properties RDF Before Editing <rdf:rdf xmlns:rdf= xmlns:rdfs= > <rdf:property rdf:about= #Teaches /> <Course rdf:about= #COMP582 /> <Lecturer rdf:about= #Annika > <fname>annika</fname> <lname>hinze</lname> <Teaches rdf:resource= #COMP582 /> </Lecturer> </rdf:rdf> RDF After Editing <rdf:rdf xmlns:rdf= xmlns:rdfs= > <rdf:property rdf:about= #Teaches /> <Course rdf:about= #COMP582 /> <Lecturer rdf:about= #Annika > <fname>annika</fname> <lname>hinze</lname> <Teaches rdf:resource= #COMP319 /> </Lecturer> </rdf:rdf> Fig. 12. Portion of an RDF file in XML changes made to it

Keeping Track of the Semantic Web: Personalized Event Notification 671 <rdf:rdf xmlns:deltaxml= " xmlns:rdf= " deltaxml:delta="wfmodify"> <rdf:property deltaxml:delta="unchanged" rdf:about="#teaches"/> <Course deltaxml:delta="unchanged" rdf:about="#comp582"/> <Lecturer deltaxml:delta="wfmodify" rdf:about="#annika"> <fname deltaxml:delta="unchanged" >Annika</fName> <lname deltaxml:delta="unchanged" >Hinze</lName> <Teaches deltaxml:delta="wfmodify" deltaxml:oldattributes="rdf:rescouce="#comp582"" deltaxml:newattributes="rdf:resource="#comp319""/> </Lecturer> </rdf:rdf> Fig. 13. The results of running DeltaXML on the files in Figure 12 may be kept separate from one another. Profiles interested in changes in a property that is used with a class in this manner will be interested in events that occur on the class as well as directly on the property. To correctly identify the events, the results from DeltaXML are passed through an XML style sheet transformation (XSLT). For example, it transforms the result of the Delta XML from Figure 13 into the event message seen in Figure 14. It removes superfluous DeltaXML information, e.g., about unchanged elements. <Annika> <action>modify</action> <Teaches>COMP582 <action>delete</action> </Teaches> <Teaches>COMP319 <action>add</action> </Teaches> </Annika> Fig. 14. Message after XSLT transformation The DeltaXML libraries only work with changed documents. For the addition and deletion of entire documents, the system simulates a change by creating a temporary, empty document. This enables us to treat these events in the same manner as changed documents and thus allow for filtering. Profile 1) Annika [Teaches[ * ]] and Teaches[action[ add ]] 2) Course[action[ modify ]] 3) Lecturer[action[ add ]] 4) Lecturer[fName[ * ]] Interpretation 1) Annika teaches another course 2) A course is changed 3) Another Lecturer is added to the network 4) Any events that affects a Lecturers First Name Fig. 15. Some profiles that users might register on the data in Figure 12 The next step is to filter the event message using an XML filter algorithm. We employ the XML filter algorithm ApproXFilter (Michel and Hinze, 2005). We adapted ApproXFilter to match profiles regarding triple changes in the makeup the RDF document (for example profiles see Figure 15).These profiles specify a set of

672 A. Hinze and R. Evans elements that the user is interested in and the types of changes that will trigger a notification. Elements are separated by square brackets to enable the user creating a profile to specify the relationships that the elements must have to one another. In our implementation supports three points of user interaction: Document authors submit documents to the SVN server either by using WebDAV in an Apache server, the dedicated SVN server, or directly to the repository. New profiles enter the filter via a connection to port 8088; the system inserts them to its profile tree. Direct insertion via the profile directory is also possible. Finally, notifications are sent via email. 6 Evaluation In this section, we report the results of our performance evaluation. We designed two main tests: The first test evaluated the efficiency and accuracy of the Delta Adapter program. The second set of tests evaluated the performance of the filter for different numbers and types of profiles. The separate parts of the system were tested independently of each other since they are distinct programs that co-operate. 6.1 Observer (DeltaAdapter) Testing These tests determine the efficiency of the event detection process in the observer. Since deletions and insertions use the same operations, we only test triple deletions. The variables for the observer tests are shown in Figure 16. O T and N T represent the size of the input document before and after changes, respectively. They influence the Delta generation as each document is created as a tree and then compared node by node. C denotes the number of triples that have changed between O T and N T. The more changes the more time is consumed by the tree comparisons. C S is large when a high number of changes is large compared to the document size. Processing a change with large C S should be very quick. Large C I adds noise to the filtering process. Symbol Description O T Number of triples in original document N T Number of triples in changed document C Number of changed triples C S Changes relative to size of document (C/O T ) C I Number of Changes that do not represent a triple change Fig. 16. Variables for observer tests Observer test 1 Triple change test Goal: Test the performance of the observer. We use a specified number of changes in RDF documents of increasing size. Hypothesis: Regardless of the number of changes it is expected that there will be a linear increase in time taken to process a document as the number of triples increases. Figure 17 shows the time to process a document for increasing numbers of triples (one triple removed, N T = O T -1, C=1, C I =0; all triples removed, N T = 0, C= O T, C I =0). We see the linear dependency on the number of triples in the document. This

Keeping Track of the Semantic Web: Personalized Event Notification 673 behaviour is independent of Observer test one 12000 the number of triples removed. The jump between 10000 8000 6000 10000 and 20000 triples is investigated further in Figure 18, which shows the average All Triples Removed time per triple in milliseconds. 4000 1 Triple removed This graph clearly 2000 shows the heavy influence the size of the document has on 0 the speed of event detection. The reason is that the system Number of Triples in Document (Ot) Fig. 17. Triple change test has to build DOM trees in main memory for both documents and then compare to find the minimum difference. This graph shows the influence of the two different costs associated in the change detection process: below 20,000 triples the initialisation structures used to compare the XML files is the main influence. The second factor is the cost of comparing the DOM trees; it dominates above 30000 where the fixed initialisation costs are divided among many more triples. Time (ms) 5000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.5 Observer test one (average per triple) Time (ms) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 All Triples Removed 1 Triple removed 5000 10000 20000 30000 40000 50000 60000 70000 80000 90000 1E+05 Number of Triples in Document (Ot) Fig. 18. Triple change test (averaged) Observer test 2 Non change test Goal: Test the effect of a change that does not change the meaning of the RDF on the performance of the Delta Adapter. Hypothesis: regardless of the number of changes, it is expected that there will be a linear increase in the time it takes to process a document with increasing numbers of non triple changes. We observe that an increasing number of non triple changes cause an increase in the processing time. It is important to note that O T and N T were identical in this test

674 A. Hinze and R. Evans (Parameters: N T = 10,000 Observer test two O T =10,000, C=0, Cs=0). 6800 All the changes involved 6600 adding extra c arriage 6400 returns or swapping the order of child nodes, 6200 neither of which makes 6000 any difference to RDF. 5800 There were no effective 5600 RDF changes; the system had to match the different 5400 5 10 15 20 portions of the DOM tree Number of Non Triple Changes (Ci) to confirm that nothing had changed in the model. Fig. 19. Non triple change test Comparing Figure 19 with the respective data in Figure 17 (10,000 triples) one finds that five non structural changes to the RDF file result in the doubling of the observer time. Time (ms) 6.2 Filter Testing The second series of tests are efficiency tests for the filter. Because the system performs approximate matching, the number of profiles that match an event is not a factor in the processing time. The filter does not stop unless all profiles and the whole document are tested. Non-matches create costs for each profile. The match cost also do not influence the performance. Only those profiles with match cost below a given threshold will be notified. The variables for the filter tests are described in Figure 20. E affects the filtering as each member of the triple needs to be checked against the profile tree. P are P C expected to be major influences on the performance. P D indicates the increases in the size of the filter tree. P C is important since every conjunct adds between one and three extra nodes to the tree for the profile. This increases the number of nodes to be evaluated. S represents equivalent words. The size of the profile tree increases linearly with the size of the synonym net. The parameters for the tests are E = 2, P = 10K 100K, P D =0, S=0, P C =0,1,2. Symbol E P P D P C S Description Number of Triples in Event Message Number of profiles Degree of independence in profiles Average Number of Conjuncts in profile Number of terms in the synonym net Fig. 20. Variables for filter tests Filter test 1 Event parsing Test Goal: Test the effect of increasing number of distinct profiles and number of conjuncts on the event parsing stage of the filtering process

Keeping Track of the Semantic Web: Personalized Event Notification 675 Hypothesis: We expect than an increase in the number of distinct profiles should have a linear effect on the processing time. Increasing the number of conjuncts should not affect the runtime of the event parsing. The results are given in Figure 21. We observe a linear influence of the number of profiles on the performance due to the increase in the size of the profile tree caused by the increase in the number of distinct profiles. The influence of the number of conjuncts is not statistically significant. Filter test one 700 600 500 Time (ms) 400 300 200 100 No Conjuncts One In Literal One joining profiles Both Conjuncts 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Number of Profiles (P) Fig. 21. Event parsing test Filter test 2 Profile matching test Goal: Test the effect of increasing the number of distinct profiles and the number of conjuncts on the profile matching. Hypothesis: An increase in the number of Filter test two profiles has a linear effect on the runtime of 1800 1600 the filtering process. Increasing the number 1400 of conjuncts should 1200 1000 800 have a linear effect on the runtime due to the extension of the filter No Conjuncts 600 One In Literal process. 400 One joining profiles We observe that the 200 0 Both Conjuncts number of profiles has a linear influence on the performance. This relationship is not as Number of Profiles (P) pronounced as we expected, because of Fig. 22. Profile matching test the considerable initial Time (ms) 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

676 A. Hinze and R. Evans overhead. The effect of adding conjuncts is interesting: when adding a conjunct to the end of the profile we achieved the expected linear increase in profile match time (see Figure 22). However, when the conjunct was joining two distinct triples, the profile matching was as quick as if there had been only one. We believe that this is caused by the profile cost calculation: Once the filter has checked a profile it evaluates the cost of that profile to see if it exceeds the cost threshold. When a profile consists of multiple independent parts connected by a conjunct, the filter can stop processing that profile. This would account for conjuncts joining two independent profiles taking the same time as single profiles with no conjuncts. Profiles incorporating both types of conjuncts behaved exactly as the first type with only one conjunct. 7 Conclusions and Future Work We identified four research questions to be addressed to keep track of changing semantic web models. In this paper, we focussed on two closely related questions from that list: how to detect changes in the semantic web data and how to identify users who need to be notified about a change in the SW network? We presented a common solution for both challenges. Our ENS-SW system allows the user to register a profile indicating an interest in a portion of the RDF network stored in the repository. Whenever a change occurs on the network, it will be filtered against all the profiles stored in the ENS, and when a match is found, the corresponding users will be notified. Our evaluation has shown that the system performs as expected. The key points of our system are as follows: (1) Easy Deletion Detection: Detection of deletions is often ignored in ENS due to the complexity. Often, a mirror would need to be used for the data repository. Since we store the data in the form of deltas, deletion detection becomes a trivial task. (2) Using Subversion and diff: To the best of our knowledge, versioning software or XML diffs have not been used to detect changes in XML data. Typically, only new XML files can be processed by filters. (3) Using the ApproXFilter algorithm: The matching algorithm supports approximative filtering. A user can state how close a match has to be to count as a match. This approach removes the need for a user to register several similar. For future research, we plan to address the challenge of distribution. The current system supports connection to any number of observers that each connect to one filter only. Support for XML namespaces and attributes is also planned to be added to the system, both on the observer side and for the filtering. Finally, two further questions identified early in the paper remain open for future research. The fourth question that of identifying what needs to be changed in an RDF network in response to a notification is a particularly interesting and significant problem. We aim to automatically adapt the network in response to notifications received from other filters in order to automatically synchronise distributed semantic web networks.

Keeping Track of the Semantic Web: Personalized Event Notification 677 References 1. Beckett, D. (2004) RDF/XML Syntax Specification, W3C Recommendation, 10 February 2004. Available at http://www.w3.org/tr/2004/rec-rdf-syntax-grammar-20040210 (23 March 2005). 2. Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web. Scientific American 284(5):34-43; May. 3. Berners-Lee, T., Fielding, R.T. and Masinter, L. (2005) Uniform Resource Identifier (URI): Generic Syntax. Available at http://www.gbiv.com/protocols/uri/rfc/rfc3986.html (23 March 2005). 4. Brickley, D. and Guha, R. V. (2004) RDF Vocabulary Description Language 1.0: RDF Schema, W3C Recommendation 10 February 2004. Available at http://www.w3.org/tr/2004/rec-rdf-schema-20040210 (23 March 2005). 5. Broekstra, J., Kampman, A., van Harmelen, F. (2001) Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. In Semantics for the WWW, edited by D. Fensel, J. Hendler, H. Lieberman and W. Wahlster, 2001. MIT Press, Boston, Massachusetts. 6. Collins-Sussman, B., Fitzpatrick, B.W. and Pilato, C. M (2004) Version Control with Subversion. O Reilly, Cambridge, Massachusetts. 7. Connolly, D., van Harmelen, F., Horrocks, I., McGuinness, D.L. and Patel-Schneider, P.F. (2001) DAML+OIL (March 2001) Reference Description W3C, Note 18 December 2001. Available at http://www.w3.org/tr/2001/note-daml+oil-reference-20011218 (24 March 2005). 8. Deutsch A., Fernandez M., Florescu D., Levy A., Suciu D. (1999) XML-QL: a Query Language for XML. Proc. of the Int. World Wide Web Conference (WWW), Toronto 9. Gibbins, N., Harris, S. and Shadbolt, N. (2004) Agent-based Semantic Web Services. Web Semantics: Science, Services and Agents on the World Wide Web 1(2):141 154. 10. Heflin, J. (2004) OWL Web Ontology Language Use Cases and Requirements, W3C Recommendation 10 February 2004. Available at http://www.w3.org/tr/2004/recwebont-req-20040210 (24 March 2005). 11. Hinze, A. (2003) A-mediAS: An Adaptive Event Notification System. Proc 2nd International Workshop on Distributed Event-based Systems, San Diego, USA. 12. Karvounarakis, G, Alexaki, S,Christophides V, Plexousakis, D, Scholl, M,(2002) RQL: a declarative query language for RDF. In WWW, pages 592 603, 2002. 13. Klyne, G. and Carroll, J.J. (2004) Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation, 10 February 2004. Available at http://www.w3.org/tr/2004/rec-rdf-concepts-20040210 (24 March 2005). 14. Kozuka, T. (2004) The Adaptive Semantic Web. ; Comp 591 Dissertation, Department of Computer Science, University of Waikato, Hamilton, New Zealand. 15. La Fontaine, R. (2001) A Delta Format for XML: Identifying Changes in XML Files and Representing the Changes in XML. Proc XML Europe 2001, Berlin, Germany. 16. Michel, Y. and Hinze, A. (2005) ApproxFilter An Approximative, XML-based Event Filter. Technical Report 06/2005, Department of Computer Science, University of Waikato, Hamilton, New Zealand. 17. Patel-Schneider, P.F., Hayes, P. and Horrocks, I. (2004) OWL Web Ontology Language Semantics and Abstract Syntax, W3C Recommendation 10 February 2004. Available at http://www.w3.org/tr/2004/rec-owl-semantics-20040210 (24 March 2005).

678 A. Hinze and R. Evans 18. Qin, L. and Atluri, V. (2004) Ontology-guided Change Detection to the Semantic Web Data. Proc 23rd International Conference on Conceptual Modeling (ER2004), Shanghai, China, pp. 624 638. 19. Silva Filho, R.S., de Souza, C.R.B. and Redmiles, D. F. (2003) The Design of a Configurable, Extensible and Dynamic Notification Service. Proc 2nd International Workshop on Distributed Event-based Systems, San Diego, USA. 20. Whitehead, J. and Goland, Y. Y., (1999) WebDAV: A NetworkProtocol for Remote Collaborative Authoring on the Web. In Proc. of the European Computer Supported Cooperative Work Conference (ECSCW 99),. Available at http://www.webdav.org.