Processing Heterogeneous RDF Event Streams with Standing SPARQL Update

Processing Heterogeneous RDF Streams with Standing SPARQL Update Mikko Rinne, Haris Abdullah, Seppo Törmä, Esko Nuutila http://cse.aalto.fi/instans/ 11.9.2012 Department of Computer Science and Engineering Distributed Systems Group

Smart Cities Need Interoperability Smart environments of the future interconnect billions of sensors Platforms from multiple vendors Operated by different companies, public authorities or individuals Highly distributed, loosely coupled solutions based on common standards are required Challenge to proprietary platforms Semantic web standards RDF, SPARQL and OWL offer a good base for interoperability How would they work for event processing?

Solution Components 1. Method: Multiple collaborating SPARQL queries and update rules processing heterogeneous events expressed in RDF 2. Implementation (INSTANS): Incremental continuous query engine based on the Rete-algorithm

An = Anything that happens or is contemplated as happening *) Seppo came in Mikko came in Esko came in (Simple) (Simple) (Simple) It is 9 a.m. Seppo, Mikko and Esko are in. (Simple) (Simple) (Simple) (Simple) Composite Synthesized Complex Meeting started in time Summarizes, represents, or denotes a set of other events *) *) Luckham, D., Schulte, R.: processing glossary version 2.0 (Jul 2011)

Heterogeneous Representations Variable event structures in an open environment Different sensors may support different parameters Queries can match the data of interest and disregard the rest Semantic web standard RDF has flexible support for heterogeneous event structures Alternative approaches typically cover data stream processing on individual timeannotated triples :p3 tl: Insta nt event: agent rdf: type event: rdf: type :e1 event: time tl: at 2011-10-03T08:1 7:11 event: place Example Location Update 60.1587 76 geo: lat geo: long 24.8814 90 geo: alt

SPARQL Query + Update SPARQL is tailor-made to query RDF data SPARQL 1.1 Update supports INSERT operations, enabling Memory Communication between SPARQL queries Stepwise processing of data Applications can be constructed entirely of SPARQL Queries

Close Friends Example Service Mobile clients emit location updates Service produces a nearby notification if two friends come geographically close to each other 1. Static input (RDF Store) Configuration 2. Producer (RDF Stream) 5. Consumer Mobile Client 3. Channel 4. Processing Agent Network

Approach 1: Single Query CONSTRUCT {?person1 :nearby?person2 } WHERE { # Part 1: Bind event data for pairs of persons who know each other GRAPH <http://externalgraphstore.org/socialnetwork> {?person1 foaf:knows?person2 } <bind events for p1+p2> # Part 2: Remove events, if a newer event can be found FILTER NOT EXISTS { Find the latest?event3 rdf:type event: ; location for event:agent?person1 ; event:time [tl:at?dttm3]. each person?event4 rdf:type event: ; event:agent?person2 ; event:time [tl:at?dttm4]. FILTER ((?dttm1 <?dttm3) (?dttm2 <?dttm4)) } # Part 3: Check if the latest registrations were close in space and time FILTER ( (abs(?lat2-?lat1)<0.01) && (abs(?long2-?long1)<0.01) && (abs(hours(?dttm2)*60+minutes(?dttm2)-hours(?dttm1)*60-minutes(?dttm1))<10))} Finds friends, whose latest registrations are close in space and time Doesn t do anything for buffer management or re-execution of the query

Approach 2: Window-Based Streaming SPARQL REGISTER QUERY CloseFriends COMPUTED EVERY 2m AS SELECT?person1?person2 FROM STREAM <http://myexample.org/personlocationupdates> [RANGE 10m STEP 2m] FROM http://streams.org/socialnetwork.rdf WHERE { # Part 1: Bind event data for all friends?person1 foaf:knows?person2 <bind events for p1+p2> FILTER ( ( ((?lat2-?lat1)*(?lat2-?lat1)) < 0.01*0.01) ) FILTER ( ( ((?long2-?long1)*(?long2-?long1)) < 0.01*0.01) ) } ORDER BY?dttm1?dttm2 Window range and repetition rate C-SPARQL environment handles windowing, removal of old events and repetition of query Duplicate removal has to be handled by external means Notification delay and duplicates lead to compromises

Approach 3: Collaborative SPARQL Update Rules Query 1: Maintain only the latest registration in the workspace Query 3: Emit notifications Query 2: Insert a nearby detection marker Query 4: Delete nearby status No duplicate detections Buffer management handled by SPARQL

Rete-Algorithm in INSTANS Translation of SPARQL queries into an incremental processor Each input triple propagates according to the queries and resulting states are saved within the structure When a complete query is matched, results are immediately available This sample query selects events between 10 and 11 o clock!1 Query: "1:! a event:event Y1 SELECT?event WHERE {?event a event: ; event:7me?7me.?7me tl:at?day7me. FILTER ( hours(?day7me) = 10 ) } Process flow:?event 2?event!2 1 "2:! event:time! :e1?event 1 Each condi7on corresponds to an α- node. α1 matches with sample input :e1 a event:. 2 :e1 propagates to β2 and is stored there. 3 α2 matches with :e1 event:,me _:b1, where _:b1 is a blank node. Input from β2 matches with?event in Y2. 4 :e1 and _:b1 propagate un7l β3. 5 α3 matches with input _:b1 tl:at 2011-10- 03T10:05:00 ˆˆxsd:dateTime. 6 In Y3 _:b1 is equal in both incoming branches and can be eliminated. 7 :e1 and 2011-10- 03T10:05:00 ˆˆxsd:dateTime reach filter1. The condi7on hour = 10 is true. 8 :e1 is selected as a result. Y2 4!3 3?event,?time?event,?time 6 7 8 :e1 _:b1?event,?time Y3 filter1 select1 "3:! tl:at!?event,?daytime?event 5?time,?daytime Drop _:b1 :e1 10:05

Comparison of Approaches Single Query C-SPARQL INSTANS Correctness of notifications Yes Yes if windows overlap Yes Duplication elimination Only within one query Only inside window Yes Timeliness of notifications Query triggered Periodically Triggered triggered Scalability wrt #events No Yes Yes

Notification Delay Results 5 simulated friends moving on a map C-SPARQL query processing delay varied 12-253 ms for 5-60 events, respectively Window repetition rate is the dominant component of the notification delay With 1 event per second inter-arrival time C-SPARQL notification delay measured at 1.34 25.90 seconds. Notification Delay [s] 30.0 25.0 20.0 15.0 10.0 5.0 0.0 C-SPARQL INSTANS 5s 10s 20s 30s 40s 50s 60s C-SPARQL Window Length INSTANS: 12 ms independent of window length

Summary processing based on RDF-encoded heterogeneous events format can evolve independently of event processing application Built-in support for disjoint vocabularies SPARQL Query + Update Application can be built entirely out of collaborating SPARQL queries Access to linked open data, future possibilities for inference No proprietary extensions needed so far Promise of good interoperability in multi-vendor multi-actor environments Continuous incremental matching using the Rete-algorithm No repeating windows (processing repetition, duplicate matches, missed detections on window borders) Application areas in smart spaces, context-aware mobile systems, internet-of-things, the real-time web etc. *) ACM Special Interest Group for Applied Computing

Conclusions Collaborative SPARQL queries are a promising method for event processing using semantic web technologies A platform capable of continuous event-driven evaluation of parallel SPARQL-queries supporting SPARQL 1.1 Update (INSERT) is needed INSTANS outperforms the comparison approaches Single SPARQL query lacks buffer management and repetition Window-based streaming SPARQL suffers from contradicting requirements in setting operation parameters INSTANS gives corrent notifications without duplicates in a fraction of the notification time of Streaming SPARQL.

Background Material

Queries in Approach 3 Query 1) Window-query: DELETE { <bind event to variables>} WHERE { <bind event to variables> FILTER EXISTS {?event2 event:agent?person ; event:time [tl:at?dttm2]. FILTER (?dttm <?dttm2) } } Query 2) Nearby detection INSERT {?person1 :nearby?person2 } WHERE {?person1 foaf:knows?person2. <bind events for p1+p2> # Check proximity in space and time FILTER ((abs(?lat2-?lat1)<0.01) && (abs(?long2-?long1)<0.01) && (abs(hours(?dttm2)*60+minutes(?dttm2) -hours(?dttm1)*60-minutes(?dttm1))<10)) # Don't insert, if the relation already exists FILTER NOT EXISTS {?person1 :nearby?person2}} Query 3) Notification: SELECT?person1?person2 WHERE {?person1 :nearby?person2 } Query 4) Removal of ``nearby'' status: DELETE {?person1 :nearby?person2 } WHERE {?person1 foaf:knows?person2. <bind events for p1+p2> FILTER ( (abs(?lat2-?lat1)>0.02) (abs(? long2-?long1)>0.02)) FILTER EXISTS {?person1 :nearby?person2 } }