Optimising a Semantic IoT Data Hub

Similar documents
Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr.

Web of Things Architecture and Use Cases. Soumya Kanti Datta, Christian Bonnet Mobile Communications Department

SEPA SPARQL Event Processing Architecture

Querying Data with Transact SQL

The Emerging Data Lake IT Strategy

W3C WoT call CONTEXT INFORMATION MANAGEMENT - NGSI-LD API AS BRIDGE TO SEMANTIC WEB Contact: Lindsay Frost at

Using Linked Data Concepts to Blend and Analyze Geospatial and Statistical Data Creating a Semantic Data Platform

From Raw Sensor Data to Semantic Web Triples Information Flow in Semantic Sensor Networks

GeoSPARQL Support and Other Cool Features in Oracle 12c Spatial and Graph Linked Data Seminar Culture, Base Registries & Visualisations

Semantic Web Information Management

Publishing Statistical Data and Geospatial Data as Linked Data Creating a Semantic Data Platform

Orchestrating Music Queries via the Semantic Web

Open And Linked Data Oracle proposition Subtitle

White Paper. EVERY THING CONNECTED How Web Object Technology Is Putting Every Physical Thing On The Web

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Querying the Semantic Web

Dynamic Semantics for the Internet of Things. Payam Barnaghi Institute for Communication Systems (ICS) University of Surrey Guildford, United Kingdom

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

Developments in data communications

onem2m ADAPTED FOR SMART CITIES ETSI IOT M2M WORKSHOP 2016 NOV 15 TH 2016

Semantic Web Fundamentals

BSC Smart Cities Initiative

DBpedia Data Processing and Integration Tasks in UnifiedViews

Ontology-Based Data Access via Ontop

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012

Proposal for Implementing Linked Open Data on Libraries Catalogue

Semantic Web Fundamentals

Webinar Annotate data in the EUDAT CDI

Towards Open Innovation with Open Data Service Platform

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

4) DAVE CLARKE. OASIS: Constructing knowledgebases around high resolution images using ontologies and Linked Data

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Km4City Smart City API: an integrated support for mobility services

The onem2m standard Horizontal Service Layer

Semantic Technologies for the Internet of Things: Challenges and Opportunities

> Semantic Web Use Cases and Case Studies

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Enterprise Data Catalog for Microsoft Azure Tutorial

Multi-agent and Semantic Web Systems: Linked Open Data

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

Querying Microsoft SQL Server 2014

COMP9321 Web Application Engineering

A SMART PORT CITY IN THE INTERNET OF EVERYTHING (IOE) ERA VERNON THAVER, CTO, CISCO SYSTEMS SOUTH AFRICA

OLAP over Federated RDF Sources

Third generation of Data Virtualization

COMP9321 Web Application Engineering

M3 Framework: User s guide & tutorial

Cisco Smart+Connected Communities

Querying Data with Transact-SQL

Unlocking the full potential of location-based services: Linked Data driven Web APIs

Enterprise Information Integration using Semantic Web Technologies:

Linked Data: What Now? Maine Library Association 2017

The onprom Toolchain for Extracting Business Process Logs using Ontology-based Data Access

CONTEXT INFORMATION MANAGEMENT

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Assisting IoT Projects and Developers in Designing Interoperable Semantic Web of Things Applications

Semantic Queries and Mediation in a RESTful Architecture

Querying Data with Transact-SQL

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Querying Data with Transact-SQL

BIOLOGICAL PATHWAYS AND THE SEMANTIC WEB

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

20761B: QUERYING DATA WITH TRANSACT-SQL

IP Based Architecture for the Internet of Things. IPV6 and Related Standards for IoT Interoperability November 20, 2014

SPARQL เอกสารหล ก ใน มคอ.3

Querying Semantic Web Data

Defragmenting the IoT with the Web of Things

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Knowledge-Driven Video Information Retrieval with LOD

20761C: Querying Data with Transact-SQL

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

SPARQL-Based Applications for RDF-Encoded Sensor Data

Mapping between Digital Identity Ontologies through SISM

Linking Distributed Data across the Web

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper

onem2m AND SMART M2M INTRODUCTION, RELEASE 2/3

Resource Discovery in IoT: Current Trends, Gap Analysis and Future Standardization Aspects

Querying Data with Transact-SQL

Linked Stream Data Processing Part I: Basic Concepts & Modeling

Unlocking Intelligent Communities with connected lighting and beyond

STS Infrastructural considerations. Christian Chiarcos

SEXTANT 1. Purpose of the Application

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Reducing Consumer Uncertainty

Querying Data with Transact-SQL

Open Standards for Linked Organisations. Linked Base Registries as a key enabler for egovernment in Flanders #OSLO2

Information Workbench

AVANTUS TRAINING PTE LTD

SPARQL ME-E4300 Semantic Web,

Pedigree Management and Assessment Framework (PMAF) Demonstration

SEMANTIC WEB DATA MANAGEMENT. from Web 1.0 to Web 3.0

Smart Cities & The 4th Industrial Revolution

Database of historical places, persons, and lemmas

Designing dashboards for performance. Reference deck

Extracting knowledge from Ontology using Jena for Semantic Web

SWoTSuite: A Toolkit for Prototyping End-to-End Semantic Web of Things Applications

Porting Social Media Contributions with SIOC

Transcription:

Optimising a Semantic IoT Data Hub Ilias Tachmazidis, Sotiris Batsakis, John Davies, Alistair Duke, Grigoris Antoniou and Sandra Stincic Clarke John Davies, BT

Overview Motivation de-siloization and data hubs Semantic lifting of IoT Data hub Optimization of SPARQL SQL mapping Concluding Remarks 2

IoT Ecosystem SENSORS CONNECTIVITY DATA HUB APPLICATIONS Light Sensor Bin Usage TVWS Smart Street Lighting Waste Management Parking Sensor Analytics Smart Parking Dev Environment Soil Moisture Vehicle Telemetry MESH IT Services Information Spine Driver Assist RFID Trace UNB Tracing Assets: BT Trace Enabling the IoT ecosystem

Data interoperability & IoT hubs Data hubs provide economies of scale and uniform access to data lower the barrier to participation, avoid silos and foster innovation Key challenges Resource discovery and access (what data does this hub have? how do I get it? what other hubs are there?) Hypercat is a specification which attempts to address these issues

HyperCat is: A machine-readable catalogue format for IoT data A JSON file format for cataloguing IoT resources - a HyperCat file A web API for fetching, serving, searching and updating HyperCat catalogues - The HyperCat Web API A thin horizontal layer using technology wellunderstood by web developers Publically available specification from BSI, some open source tooling

Catalogues and Resources Servers (data hubs) provide catalogues of resources to clients A catalogue is an array of URIs

Resource Metadata Each resource (URI) in the catalogue is annotated with metadata (triples!)

Hypercat fragment air quality data feed in Milton Keynes "items":[ { "href":"http://api.stride-project.com/sensors/feeds/3bdae7b8-c4c6-4701-81a7-e9ffcb47c6ac", "i-object-metadata":[ { "rel":"urn:x-hypercat:rels:hasdescription:en", "val":"air quality data from MK" }, { "rel":"urn:x-hypercat:rels:iscontenttype", "val":"application/xml" }, Associating metadata with the feed id, a URI pointing to the feed data itself

Semantic lifting: why? Case Study: CityVerve City Concierge Helping visitors to the city Navigate & Access transport services Find out about cultural events Discover local businesses Community feedback and asset management Check the weather, air pollution, real-time traffic / travel

Semantic lifting: why? Case Study: CityVerve City Concierge App delivery via smartphone, desktop and street furniture Requirement: combination of data from multiple sources (Federated) SPARQL queries allow integration of multiple data sources e.g. external endpoint about events can be queried

Semantic Enhancements to Data Hub : Catalogue Semantic Lifting Existing DataHub provides: Hypercat Catalogue using specified JSON format Limited semantic capability Semantic enhancement allows: Items in the catalogue to be related to existing (domain) ontologies supporting more powerful search and discovery Reasoning over catalogue items E.g. this feed about air temperature is also a feed about weather

HyperCat Catalogue Ontology

Semantic Enhancements to Data Hub: Data Feeds Semantic Lifting Existing DataHub provides: Egress of data from SQL DB via RESTful API in XML or JSON format Semantic enhancement allows: Data in the Data Hub to be related to existing (domain) ontologies supporting Semantic enrichment (attaching ontological metadata, i.e. relate to other (external) classes or properties) Rule-based reasoning (e.g. SWRL; define expected ranges, if exceeded, then trigger action) Automatic translation of data to required form (e.g. units) Querying across discrete data feeds to identify relevant data Federated queries to additional external data (LoD Cloud)

Data hub tables (fragment) based on EEML (eeml.org) Sensor Feed table Id Title URL Status Private Website Email Sensor Stream table Id Feed Tag Min Max Unit Id Stream Feed Time Value Lat Lon Datapoint table

BT Data Hub Ontology (3 principal types: sensors, events, location data)

HyperCat JSON to HyperCat RDF (N-Triples) (generate RDF version of catalogue) http://portal.bt-hypercat.com/cat rel : urn:x-hypercat:rels:iscontenttype val : application/vnd.hypercat.catalogue+json <http://portal.bt-hypercat.com/cat-rdf> <http://portal.bt-hypercat.com/hypercat#iscontenttype> application/n-triples".

Data Translation RDF triple is provided within a single line, in N-Triples format, namely: <subject> <predicate> <object>. The URI of a SensorFeed is generated by BT data hub as: http://api.bt-hypercat.com/sensors/feeds/feedid An RDF triple providing the type of a SensorFeed: <http://api.bt-hypercat.com/sensors/feeds/feedid7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://portal.bt-hypercat.com/ontologies/bt-hypercat#sensorfeed>

HyperCat properties and their ontological equivalents (rels -> properties) MetadataAnnotator Properties

SPARQL to SQL translation Uses Ontop Library http://ontop.inf.unibz.it/ Protégé plugin used to map SPARQL patterns to SQL queries Includes reasoner Parses mappings and ontology Translates queries to SQL Apache Jena used to offer federated queries

Semantic Querying - SPARQL to SQL Dynamic translation of SPARQL queries into SQL, using Ontop OBDA tool - http://ontop.inf.unibz.it/ Implicit information is extracted from the ontology through reasoning Where richer domain ontologies are linked, semantically richer information is extracted compared to the knowledge that is stored in the relational database

Data Translation - SPARQL to SQL Mapping ID: a unique id for a given mapping, Target (Triple Template): RDF triple pattern to be generated in the answer (SQL variables are given in braces, such as {feed.id}) Source (SQL Query): SQL query to be created and submitted to the relational database Mapping ID Target (Triple Template) Source (SQL Query) mapping:sensorfeed bt-sensors:feeds/{feed.id} a bt-hypercat:sensorfeed SELECT feed.id FROM feed URI prefixes: bt-sensors: http://api.bt-hypercat.com/sensors/ bt-hypercat: http://portal.bt-hypercat.com/ontologies/bt-hypercat#

Mapping Example Mapping ID Target (triple template) Source (SQL Query) Mapping:SensorFeed bt-sensors:feeds/\{feed.id\} a bt-hypercat:sensorfeed. SELECT feed.id FROM feed PREFIX hypercat: <http://portal.bt-hypercat.com/ontologies/bt-hypercat#> SELECT DISTINCT?s WHERE{?s a hypercat:feed. } Ontop will match the triple pattern ``?s a hypercat:feed'' with the mapping ``mapping:sensorfeed'' since class SensorFeed is subclass of Feed (and would also map to equivalent mapping:eventfeed) An SQL query will be submitted to the relational database, while the retrieved Ids (feed.id?s) will be used in order to generate the response as RDF triples following the triple template SQL not aware of subclasses of feed, so sensorfeed is mapped to feed

Ontop Protégé plug-in Show me all feeds updated in last 10 minutes

Complex Query Example Query description: Get distinct resources of type Datapoint, along with data properties datapoint_at_time, and datapoint_value that are related to them, where datapoint_at_time is dated between 2018-03-23T09:00:00Z and 2018-03-23T10:00:00Z Query text: PREFIX hypercat: <http://portal.bt-hypercat.com/ontologies/bt-hypercat#> SELECT DISTINCT?s?o1?o2 WHERE{?s a hypercat:datapoint.?s hypercat:datapoint_at_time?o1.?s hypercat:datapoint_value?o2. FILTER (?o1>"2018-03-23t09:00:00z"&&?o1 <"2018-03-23T10:00:00Z") } Mapping ID Target (Triple Template) mapping:datapoint_value btsensors:feeds/{datapoint.feed}/datastreams/{datapoint.stream}/ datapoints/{at_time} bt-hypercat:datapoint_value Source (SQL Query) {datapoint.val}. SELECT datapoint.feed, datapoint.stream, TO_TIMESTAMP(datapoint.at_time) AS at_time, datapoint.val FROM datapoint

Architectural Options Replication of data in triple store + Queries require no translation so should be faster - IoT sensor data quickly becomes out of date - Additional storage and maintenance burden On demand query / result translation + No additional storage required + Freshest data always available - Translation overhead means slower responses to queries

Optimising SPARQL to SQL endpoint

Ontology Optimizations Class hierarchy restricted to classes explicitly used by Ontop for mappings faster reasoning subclasses of Feed such as PublicTransportFeed also of Stream not used in mappings and were removed Similarly, property hierarchy restricted to properties explicitly used by Ontop for mappings Domain and range assertions deleted from properties where classes are already mapped, ensuring: duplicate reduction eg Datafeed class and a hasdatastream property, do not define domain to be datafeed since duplicates will be generated Ontop attempts to extract as much information as possible during the translation from SPARQL to SQL. Any possible mapping that could derive Datafeed, will be translated into yet another SQL query

SQL Plans Optimizations In Ontop, mappings with SQL functions such as: TO_TIMESTAMP() for time unnest() for arrays ST_AsText() for PostGIS geometry are translated into separate subqueries Such subqueries may be inefficient, since: they are not indexed (executed over temporary SQL tables) they can lead to unnecessary self-joins

SQL Plans Optimizations Solution: DB columns representing time or PostGIS geometry are translated into a simpler form (such as WKT string representation for geom objects) DB columns representing arrays need to be stored in a separate SQL table Translate mappings that require the use of SQL functions into mappings over SQL tables with simple data forms

SQL Table Transformation Initial SQL table: TABLE feed(id uuid NOT NULL, updated bigint, tag character varying[], the_geom geometry) Translated to 2 new tables TABLE sparql_feed(id uuid NOT NULL, updated character varying, the_geom character varying); TABLE sparql_feed_tag(id uuid NOT NULL, tag character varying NOT NULL); (one of these per array entry) More efficient because array processing etc is eliminated Processing cost moves from query time to assertion time

SQL Data Transformation (1/2) to generate new table INSERT INTO sparql_feed (id, updated, the_geom) SELECT id, TO_TIMESTAMP(feed.updated) AS updated, ST_AsText(feed.the_geom) AS the_geom FROM feed

SQL Data Transformation (2/2) to generate new table INSERT INTO sparql_feed_tag (id, tag) SELECT feed_tag.id, feed_tag.tag FROM (SELECT feed.id, unnest(feed.tag) AS tag FROM feed) AS feed_tag

Mappings Optimization (1/2) Target (Triple pattern): bt-sensors:feeds/{sparql_feed.id} bt-hypercat:feed updated {sparql_feed.updated}. Sparql_feed.id is the key of SQL table sparql_feed Query is looking for feeds, therefore feed.id column should be the table key

Mappings Optimization (2/2) If the key contains more columns than those used in the RDF triple pattern to be generated, then: self-joins cannot be eliminated each mapping is translated as a separate subquery Self-joins can be: Manageable for relatively small tables (containing thousands of rows, if the table is indexed) Prohibitive for large tables (containing millions of rows) If you have a key across 2 columns (eg) but only 1 column used to generate URI (i.e. feed.id), self-joins required

SPARQL Query Optimizations (1/2) expensive operations Where possible, specify the predicate within each triple pattern, since triple patterns such as "?s?p?o": will be translated using all available mappings (a UNION of all defined mappings) will lead to an excessive SQL query plan Where feasible, avoid DISTINCT as: performance deteriorates final results are sorted and filtered for unique values at the end of the SQL query plan

SPARQL Query Optimizations (2/2) Use FILTER to retrieve specific Feeds: Restricting the search space early leads to more efficient SQL query plans. Avoid OPTIONAL as: each OPTIONAL is translated into a LEFT OUTER JOIN If used, OPTIONAL should be put at the end of the query Use LIMIT as limiting the amount of required results could speed up the query execution

Semantic Technology and IoT W3C, IEEE, onem2m and AIOTI Joint White Paper Semantic Interoperability for the Internet of Things The full value potential can only be unlocked if ad-hoc, crossdomain systems of IoT are able to establish conversations and build understanding Using semantic technology to provide machine-processable, shared vocabularies with automated reasoning Hypercat lightweight, domain-independent Hypercat-RDF richer descriptions, link to domain-specific vocabularies, federated querying, reasoning

Concluding Remarks Data hubs can help remove silos and maximise value of data There may be additional value in using semantic tech SPARQL to SQL can be *very* slow Significant speed-up is possible Reliant on the ability/willingness to manually inspect and edit the ontology and underlying database Next steps quantify the efficiency improvements identify further use cases where semantic tech adds value (understand relationship to SSN work!!)

Thanks for your attention john.nj.davies@bt.com