Flexible querying for SPARQL

Similar documents
A Relaxed Approach to RDF Querying

RDF Stores Performance Test on Servers with Average Specification

Representing Linked Data as Virtual File Systems

Object-UOBM. An Ontological Benchmark for Object-oriented Access. Martin Ledvinka

Triple Stores in a Nutshell

Anytime Query Answering in RDF through Evolutionary Algorithms

SPARQL BGP Optimization For native RDF graph implementations

Semantic Web Information Management

A Hybrid Approach to Linked Data

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Scaling Parallel Rule-based Reasoning

Advances in Data Management - Web Data Integration A.Poulovassilis

Querying Linked Data on the Web

WORQ: Workload-Driven RDF Query Processing. Department of Computer Science Purdue University

Schema-Agnostic Query Rewriting in SPARQL 1.1

OWL-DBC The Arrival of Scalable and Tractable OWL Reasoning for Enterprise Knowledge Bases

Foundations of SPARQL Query Optimization

Benchmarking RDF Production Tools

SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation

Accessing information about Linked Data vocabularies with vocab.cc

Evaluating semantic data infrastructure components for small devices

Robust Discovery of Positive and Negative Rules in Knowledge-Bases

PERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Leveraging the Expressivity of Grounded Conjunctive Query Languages

Expressive Querying of Semantic Databases with Incremental Query Rewriting

How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Handling Failing RDF Queries: From Diagnosis to Relaxation

An overview of RDB2RDF techniques and tools

A Framework for Performance Study of Semantic Databases

Mining the Web of Data with Metaqueries

Applications of Flexible Querying to Graph Data

Linked Open Data Cloud. John P. McCrae, Thierry Declerck

An Efficient Approach to Triple Search and Join of HDT Processing Using GPU

FedX: Optimization Techniques for Federated Query Processing on Linked Data. ISWC 2011 October 26 th. Presented by: Ziv Dayan

Introduction to RDF and the Semantic Web for the life sciences

Querying the Semantic Web

RDF AND SPARQL. Part III: Semantics of RDF(S) Dresden, August Sebastian Rudolph ICCL Summer School

Practical Aspects of Query Rewriting for OWL 2

ONTOSEARCH2: SEARCHING AND QUERYING WEB ONTOLOGIES

Semantic Web Fundamentals

A Relaxed Approach to RDF Querying

Keyword Search over RDF Graphs. Elisa Menendez

Towards Equivalences for Federated SPARQL Queries

QSMat: Query-Based Materialization for Efficient RDF Stream Processing

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

Semantic Web Fundamentals

RDF Data Management: Reasoning on Web Data

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

On Measuring the Lattice of Commonalities Among Several Linked Datasets

Logical reconstruction of RDF and ontology languages

Package rrdf. R topics documented: February 15, Type Package

Re-using Cool URIs: Entity Reconciliation Against LOD Hubs

Labeled graph homomorphism and first order logic inference

Toward Analytics for RDF Graphs

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Korea Institute of Oriental Medicine, South Korea 2 Biomedical Knowledge Engineering Laboratory,

A Semantic Framework for the Retrieval and Execution of Open Source Code

Grid Resources Search Engine based on Ontology

Keyword Search in RDF Databases

Database Optimization Techniques for Semantic Queries

6 Experiments for NL-storing of middle-size and large RDF-datasets

APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT. Mani Keeran, CFA Gi Kim, CFA Preeti Sharma

Discovering Names in Linked Data Datasets

Day 2. RISIS Linked Data Course

Stream Reasoning For Linked Data

Week 4. COMP62342 Sean Bechhofer, Uli Sattler

An Ontology Based Question Answering System on Software Test Document Domain

COMP9321 Web Application Engineering

ARQo: The Architecture for an ARQ Static Query Optimizer

The Implementation of Semantic Web Technology in Traditional Plant Medicine

COMP9321 Web Application Engineering

An approximation approach for semantic queries of naïve users by a new query language

Benchmarking Reasoners for Multi-Ontology Applications

Layers of Abstraction to Semantic Web Programming

Scaling the Semantic Wall with AllegroGraph and TopBraid Composer. A Joint Webinar by TopQuadrant and Franz

Multi-objective Linked Data Query Optimization Technical Report

Towards a Semantic Web Platform for Finite Element Simulations

Distributed RDFS Reasoning Over Structured Overlay Networks

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

Applying Ontologies in the Dairy Farming Domain for Big Data Analysis

SPARQL query answering with bitmap indexes

Multi-agent and Semantic Web Systems: Linked Open Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Extending the SHOIQ(D) Tableaux with DL-safe Rules: First Results

Fuseki Server Installation

Flexible Tools for the Semantic Web

On the Scalability of Description Logic Instance Retrieval

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Implementing Flexible Operators for Regular Path Queries

WaterFowl: a Compact, Self-indexed and Inference-enabled immutable RDF Store

Structural Characterizations of Schema-Mapping Languages

CHAPTER 1 INTRODUCTION

We focus on the backend semantic web database architecture and offer support and other services around that.

Incremental Export of Relational Database Contents into RDF Graphs

INF3580/4580 Semantic Technologies Spring 2017

MODEL-BASED SYSTEMS ENGINEERING DESIGN AND TRADE-OFF ANALYSIS WITH RDF GRAPHS

Efficient Optimization of Sparql Basic Graph Pattern

Linked Data. The World is Your Database

Understandability and Semantic Interoperability of Diverse Rules Systems. Adrian Walker, Reengineering [1]

Transcription:

Flexible querying for SPARQL A. Calì, R. Frosini, A. Poulovassilis, P. T. Wood Department of Computer Science and Information Systems, Birkbeck, University of London London Knowledge Lab

Overview of the presentation 1 Introduction Motivation Examples - Querying YAGO 2 Flexible Querying for SPARQL - SPARQL AR Query Approximation Query Relaxation Query Rewriting Algorithm Complexity of Query Answering Performance of Query Evaluation 3 Conclusions and Future Work

Motivation Why SPARQL? Most prominent language for querying RDF data, which is a standard for Linked Open Data (LOD). SPARQL 1.1 enables Regular Path queries, useful for querying graph-structured data. Querying with SPARQL - What does the user need to know? The structure of the RDF graph. The correct URIs to use. The labels appearing in the RDF graph.

Motivation Querying with SPARQL - Possible problems Lack of full knowledge of the structure of the dataset. Complexity of the structure of the dataset. Lack of knowledge of the URIs used in the dataset. Changes in the structure and URIs of the dataset over time.

Querying YAGO Example query SPARQL query over the YAGO RDF dataset, aiming to find events taking place in London on 2/09/1666: SELECT * WHERE{?x <in> "London".?x <on> "2/09/1666" } This query does not return any answers because there are no property edges named in or on in YAGO.

Querying YAGO We need to change the query in order to get some answers (by replacing the two property labels in and on, and inserting another property label, rdfs:label, in the second triple): Modified query SELECT * WHERE{?x <happenedin>/rdfs:label "London".?x <happenedondate> "2/09/1666" } This query returns an event: The Great Fire of London.

Querying YAGO A further possible modification is to relax the second triple), using the knowledge that the domain of happenedondate is Event: Modified query SELECT * WHERE{?x <happenedin>/rdfs:label "London".?x rdf:type <Event> } This query returns every event that occurred in London.

Our language: SPARQL AR SPARQL AR extends SPARQL 1.1 with two operators, APPROX and RELAX. Example query SELECT * WHERE{ APPROX(?x <in> "London"). RELAX(?x <happenedondate> "2/09/1666") } This query returns all the answers returned by the previous query, plus additional answers. Answers are returned in ranked order, according to their distance from the exact form of the query i.e. the query without the APPROX and RELAX operators.

SPARQL AR How do we rank query answers? Answers returned by a SPARQL AR query Q have an associated cost: every approximation operation - i.e. every insertion, deletion or substitution of a property label in Q - is assigned an edit cost; every relaxation operation w.r.t. RDFS entailment rules applied to Q is assigned a relaxation cost; these costs are summed to given an overall cost for a modified query, Q, and hence an overall cost for the answers returned by evaluating Q. See our paper in ODBASE 2014 for details. Research challenges: Semantics of SPARQL AR. Complexity of Query Answering. Generating query answers and their cost. Proof of soundness and completeness. Performance of query evaluation.

SPARQL AR - Approximation Modifies the query using edit operations on property labels: insertion, deletion, substitution. Example Suppose we want to know every airport directly connected to Rome, using the YAGO dataset: SELECT * WHERE{ <Rome> isconnectedto?x } This query returns no answers.

SPARQL AR - Approximation If we apply approximation to the query it is possible to retrieve the answers: Example SELECT * WHERE{ APPROX(<Rome> isconnectedto?x) } In YAGO, <Rome> is linked to its airports through the predicate linksto. One step of approximation - inserting linksto into the query, at an edit cost of c i - automatically generates this query, which will return the required answers at a distance of c i : SELECT * WHERE{ <Rome> linksto/isconnectedto?x }

SPARQL AR - Relaxation Exploits the RDFS entailment rules. Example Given this ontology, K: happenedondate dom Event; endedondate sp happenedondate; startedondate sp happenedondate. And this query: SELECT * WHERE{ <World_War_I> startedondate?x } The query will return the date in which the first world war started.

SPARQL AR - Relaxation Example continued Considering now this query: SELECT * WHERE{ RELAX(<World_War_I> startedondate?x) } Applying the subproperty rexation rule - at a relaxation cost of c sp - automatically generates this query, which will return the previous answer (at distance 0) and also the date on which the first world war ended at distance c i : SELECT * WHERE{ <World_War_I> happenedondate?x }

Complexity of Query Answering In our ODBASE 2014 paper, we study the complexity of query answering considering the combined complexity (with both the query and the graph as input), the data complexity (with the graph as input and the query fixed), and the query complexity (with the query as input and the graph fixed) of this problem. We provide tight complexity bounds for several SPARQL fragments, and show that the data, query and combined complexity of SPARQL/SPARQL 1.1 are not impacted by our extensions with the APPROX and RELAX operators.

Complexity of Query Answering Summary of results: Operators Data Query Combined Complexity Complexity Complexity AND, FILTER O( E ) O( Q ) O( E Q ) AND, FILTER, RegEx O( E ) O( Q 2 ) O( E Q 2 ) RELAX, APPROX O( E ) P-Time P-Time RELAX, APPROX, AND, FILTER, O( E ) P-Time P-Time RegEx AND, SELECT P-Time NP-Complete NP-Complete RELAX, APPROX, AND, FILTER, RegEx, SELECT P-Time NP-Complete NP-Complete

Flexible Querying Query Rewriting Algorithm: Given a query Q, this constructs a set of queries rew(q) that do not contain APPROX and RELAX such that Q G = Q rew(q) Q G for every graph G and ontology K. Starting from Q, we apply one step of approximation to triple patterns of Q that are APPROXed and one step of relaxation to triple patterns of Q that are RELAXed, generating a set of queries R each with some associated cost. We repeat this for each query in R, generating queries of greater cost and adding them to R. We limit the number of queries generated by supplying the rewriting algorithm with a maximum query cost as input. In the extended version of our ODBASE 2014 paper, we specify the query rewriting algorithm in detail and show that it is sound and complete w.r.t the semantics of SPARQL AR.

Flexible Querying Query Evaluation: To compute the query answers, we incrementally apply an evaluation function, eval, to each query generated by the rewriting algorithm, in ranked order of the cost of the queries. To each mapping returned by eval we assign the cost of the query. If we generate a particular mapping more than once we keep the one with the lowest cost.

Experimental Results We have implemented our query rewriting and query evaluation algorithms and have conducted preliminary trials of query evaluation performance over the YAGO SPARQL endpoint (to be reported in an upcoming paper) and the Lehigh University Benchmark (LUBM) - reported in the OBDASE 2014 paper. Using the latter, we generated 3 datasets: D 1 with 149, 973 triples, D 2 with 421, 562 triples, and D 3 with 673, 416 triples. We ran our experiments on a Windows 7 computer with 4Gb of RAM and a quadcore-core i7 CPU at 2.0Ghz. The query evaluation algorithm was implemented in Java and we used Jena for the SPARQL query execution.

Experimental Results - 1 Trialled queries, showing number of answers/time for the exact form of each query on D 1 -D 3 and the Approxed/Relaxed form likewise; for the latter, the cost of all edit and relaxation operations was set to 1 and the maximum cost was also set to 1. Q 1 Q 2 Q 3 Q 4 D 1 23/0.001s 34/0.004s 0/0.197s 331/0.129s D 2 63/0.006s 34/0.005s 0/0.64s 883/0.296s D 3 100/0.007s 34/0.006s 0/1.67s 1381/0.517s D 1 /AR 840/0.015s 605/0.421s 743/0.924s 348/60.7s D 2 /AR 2326/0.055s 605/1s 756/3.17s 925/451s D 3 /AR 3706/0.105s 605/1.6s 767/4.44s 1461/1492s Q 1 : SELECT * WHERE {RELAX(?X ub:headof?z)}

Experimental Results - 2 Q 1 Q 2 Q 3 Q 4 D 1 23/0.001s 34/0.004s 0/0.197s 331/0.129s D 2 63/0.006s 34/0.005s 0/0.64s 883/0.296s D 3 100/0.007s 34/0.006s 0/1.67s 1381/0.517s D 1 /AR 840/0.015s 605/0.421s 743/0.924s 348/60.7s D 2 /AR 2326/0.055s 605/1s 756/3.17s 925/451s D 3 /AR 3706/0.105s 605/1.6s 767/4.44s 1461/1492s Q 2 : SELECT * WHERE { APPROX(?X ub:worksfor ub:university0).?x ub:name?y1.?x ub:emailaddress?y2.?x ub:telephone?y3}

Experimental Results - 3 Q 1 Q 2 Q 3 Q 4 D 1 23/0.001s 34/0.004s 0/0.197s 331/0.129s D 2 63/0.006s 34/0.005s 0/0.64s 883/0.296s D 3 100/0.007s 34/0.006s 0/1.67s 1381/0.517s D 1 /AR 840/0.015s 605/0.421s 743/0.924s 348/60.7s D 2 /AR 2326/0.055s 605/1s 756/3.17s 925/451s D 3 /AR 3706/0.105s 605/1.6s 767/4.44s 1461/1492s Q 3 : SELECT * WHERE { RELAX(?Y ub:suborganizationof*?z). APPROX(?Z ub:affiliatedorganizationof ub:university0) }

Experimental Results - 4 Q 1 Q 2 Q 3 Q 4 D 1 23/0.001s 34/0.004s 0/0.197s 331/0.129s D 2 63/0.006s 34/0.005s 0/0.64s 883/0.296s D 3 100/0.007s 34/0.006s 0/1.67s 1381/0.517s D 1 /AR 840/0.015s 605/0.421s 743/0.924s 348/60.7s D 2 /AR 2326/0.055s 605/1s 756/3.17s 925/451s D 3 /AR 3706/0.105s 605/1.6s 767/4.44s 1461/1492s Q 4 : SELECT * WHERE { RELAX(?X ub:advisor/ub:teacherof?z). APPROX(?X ub:takescourse?z)}

Conclusions We have extended a fragment of SPARQL 1.1 with approximation and relaxation operators, showing that this does not increase the complexity of query answering, and that such operators can be useful in finding more answers than would be returned by the exact form of a user s query. An advantage of our query rewriting approach is that existing techniques for SPARQL query optimisation and evaluation can be reused. Even though the initial experimental study is promising, we are investigating optimizing the rewriting algorithm since it can generate a large number of queries. We are also investigating optimization techniques to deal with queries such as Q 4 which cannot be evaluated efficiently using a naive approach.

Future Work - Query Containment Intuition Given a query Q = Q 1 AND Q 2, if it is the case that Q 1 G Q 2 G for every G, then Q G = Q 1 G. Through this intuition we can reduce the number of queries generated by the Rewriting Algorithm. Issues Query containment incorporating query costs. Query containment for regular path queries, and conjunctive regular path queries, incorporating APPROX and RELAX.

Future Work - Similarity Matching and FLEX Smilarity Matching Exploits the FILTER operator of SPARQL binding a variable to a set of similar strings. e.g. FILTER sim(?x, cat ) String similarity e.g. color similar to Colour. Semantic similarity e.g. Cat similar to Kitten. Uses RDF dictionaries: WordNet. Can be extended to full English sentences. We are also investigating a new FLEX operator which combines all the aforementioned flexible querying operators, including the Similarity Matching.

Questions?