SPARQL-Based Applications for RDF-Encoded Sensor Data Mikko Rinne, Seppo Törmä, Esko Nuutila http://cse.aalto.fi/instans/ 5 th International Workshop on Semantic Sensor Networks 12.11.2012 Department of Computer Science and Engineering Distributed Systems Group
Smart Cities Need Interoperability Smart environments of the future interconnect billions of sensors Platforms from multiple vendors Operated by different companies, public authorities or individuals Highly distributed, loosely coupled solutions based on common standards are required Challenge to proprietary platforms Semantic web standards RDF, SPARQL and OWL offer a good base for interoperability Could they be used to process sensor data?
Three-Layer Sensor Network Model Motivation for the middle layer Abstraction Interoperability Energy Efficiency: Optimisation of Sensor Access Applications Middle Layer Sensor platforms
An Event = Anything that happens or is contemplated as happening *) Seppo came in Mikko came in Esko came in (Simple) Event (Simple) Event (Simple) Event It is 9 a.m. Seppo, Mikko and Esko are in. (Simple) Event (Simple) Event (Simple) Event (Simple) Event Composite Event Synthesized Event Complex Event Meeting started in time Summarizes, represents, or denotes a set of other events *) *) Luckham, D., Schulte, R.: Event processing glossary version 2.0 (Jul 2011)
Heterogeneous Event Representations Variable event structures in an open environment Different sensors may support different parameters Queries can match the data of interest and disregard the rest Semantic web standard RDF has flexible support for heterogeneous event structures Alternative approaches typically cover data stream processing on individual timeannotated triples :p3 tl: Insta nt event: agent rdf: type event: Event rdf: type :e1 event: time tl: at 2011-10-03T08:1 7:11 event: place Example Location Update 60.1587 76 geo: lat geo: long 24.8814 90 geo: alt
Solution Components 1. Method: Multiple collaborating SPARQL queries and update rules processing heterogeneous events expressed in RDF 2. Implementation (INSTANS*): Incremental continuous query engine based on the Rete-algorithm *) Incremental engine for STANding Sparql
Event Processing Based on SPARQL SPARQL is tailor-made to query RDF data SPARQL 1.1 Update supports INSERT operations, enabling Memory Communication between SPARQL queries Stepwise processing of data Applications can be constructed entirely of SPARQL Queries Query 1: Conditionally INSERT <triple> Bindings in Rete Query 2: Use <triple> as input
Continuous Processing vs. Window Repetition Continuous processing of SPARQL queries has benefits over window repetition Instantaneous availability of results No duplicate detections due to overlapping windows No missing detections on window borders No repeated processing over the same data Window lengths typically based either on time or number of triples Based on the assumption that each triple marks a standalone event Heterogeneous event formats needed to support all types of sensor input
INSTANS Event Processing Platform Based on the Rete-algorithm Performs continuous evaluation of incoming RDF data against multiple SPARQL queries Intermediate results are stored into a β-node network When all the conditions of a query are matched, the result is instantly available. Event Processing Network 1. Event Producer (e.g. INSTANS) 2. Event Channel (RDF) 3. Event Processing Agent (INSTANS) 2. Event Channel (RDF) 4. Event Consumer (e.g. INSTANS)
INSTANS Phases 1. Compilation Queries parsed into abstract syntax trees Syntax trees translated into a Rete network with shared structure Rete network translated into a set of Lisp functions 2. Execution An α-matcher receives commands to add and remove triples and calls add or remove methods of corresponding α-nodes Rete network propagates changes through a β network Fully satisfied rule conditions are executed, causing add and remove triple commands to be fed into output connectors and / or in a feedback loop to the α-matcher
Processing of Timed Events The asynchronous nature of INSTANS means that all input is processed when it arrives Synthetic events at specific points in time Detection of a missing event Compilation of a report Timed events are built into INSTANS with the help of a special timergraph and a set of special predicates INSERT { GRAPH <http://example.org/timergraph> {?event <tp:timer_sec>?timevalue } } WHERE {?event <:seconds>?timevalue } Start a five-second timer: <:5sec_pulse> <:seconds> "5"^^<xsd:integer>
β1 α1: a <ffd:assigneddelivery> 1 3 5 α2: <ffd:assignbid> α3: <ffd:committedpickuptime> Rete Y1 Example Query:?request?request :req1 2?request,?bid INSERT {?request <tp:5mer_min>?rela5ve5me } WHERE {?request a <ffd:assigneddelivery> ; <ffd:assignbid>?bid.?bid <ffd:commiledpickuptime>?rela5ve5me } β2?request Y2?bid,?relativetime Process flow: 1 Each condi5on corresponds to an α-node. α1 matches with sample input <:req1> a <ffd:assigneddelivery>. 2 <:req1> propagates to β2 and is stored there. 3 α2 matches with <:req1> <ffd:assignbid> <:bid1>. Input from β2 matches with?request in Y2. 4 <:req1> and <:bid1> propagate un5l β3. 5 α3 matches with input <:bid1> <ffd:commi?edpickuptime> "15"^^<xsd:integer>. 6 In Y3 <:req1> and <:bid1> are joined with 15 ^^<xsd:integer> 7 A new triple <:req1> <tp:imer_min> 15 ^^<xsd:integer> is inserted into the main graph.?request,?bid β3 4 :req1 :bid1?request,?bid 6 :req1 :bid1 15 Y3?request,?relativetime 7 insert1
Close Friends Example Service Mobile clients emit location updates Service produces a nearby notification if two friends come geographically close to each other 1. Static input (RDF Store) Configuration 2. Event Producer (RDF Stream) 5. Event Consumer Mobile Client 3. Event Channel 4. Event Processing Agent Network
Simulated Input on Open StreetMap
Notification Delay Results 5 simulated friends C-SPARQL query processing delay varied 12-253 ms for 5-60 events, respectively Window repetition rate is the dominant component of the notification delay With 1 event per second inter-arrival time C-SPARQL notification delay measured at 1.34 25.90 seconds. Notification Delay [s] 30.0 25.0 20.0 15.0 10.0 5.0 0.0 C-SPARQL INSTANS 5s 10s 20s 30s 40s 50s 60s C-SPARQL Window Length INSTANS: 12 ms independent of window length
Logistics Management Example The Fast Flowers Delivery (FFD) application defined in Event Processing in Action by Etzion, Niblett, Luckham Flower stores request delivery service for flower orders Independent drivers bidding for the assignment based on availability Location and driver ranking Demo implementations on six different platforms are available at the book website, but none of them are coded in SPARQL. Timed events heavily used Each phase of the flower delivery needs to be monitored for time Capability to synthesize new unique events is needed SPARQL BIND Planned as a next-level verification of the method and system
Related Work Commercial event processing platforms (BusinessEvents, StreamBase, Esper, Aleri, Apama etc.) are based on proprietary programming methods SparkWave applies SPARQL queries to RDF data using extended Rete Focuses on inference and fast data stream processing of individual triples No support for connected queries C-SPARQL and CQELS support SPARQL, RDF and stream processing, but are based on window repetition and don t support connected queries Jena and Sesame support SPARQL Update, but not multiple queries INSTANS believed to be a unique combination of continuous standard-based processing of event streams using multiple connected queries
Summary Sensor event processing based on RDF-encoded heterogeneous events Event format can evolve independently of event processing application Built-in support for disjoint vocabularies SPARQL Query + Update Application can be built entirely out of collaborating SPARQL queries Access to linked open data, future possibilities for inference No proprietary extensions needed so far Promise of good interoperability in multi-vendor multi-actor environments Continuous incremental matching using the Rete-algorithm No repeating windows (processing repetition, duplicate matches, missed detections on window borders) Application areas in smart spaces, context-aware mobile systems, internet-of-things, the real-time web etc.
Conclusions Collaborative SPARQL queries are a promising method for event processing using semantic web technologies Applicable to any of the three layers in the sensor computing model Both INSTANS and the Rete-network within are fully distributable A platform capable of continuous event-driven evaluation of parallel SPARQL-queries supporting SPARQL 1.1 Update (INSERT) is needed Early testing indicates good performance for INSTANS