Toward Analytics for RDF Graphs

Similar documents
RDF Data Management: Reasoning on Web Data

Database Optimization Techniques for Semantic Queries

Efficient Query Answering against Dynamic RDF Databases

Optimizing Reformulation-based Query Answering in RDF

View Selection in Semantic Web Databases

Query-Oriented Summarization of RDF Graphs

COMP718: Ontologies and Knowledge Bases

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

INF3580/4580 Semantic Technologies Spring 2017

Semantic reasoning for dynamic knowledge bases. Lionel Médini M2IA Knowledge Dynamics 2018

The Logic of the Semantic Web. Enrico Franconi Free University of Bozen-Bolzano, Italy

Semantic Web. MPRI : Web Data Management. Antoine Amarilli Friday, January 11th 1/29

Ontological Modeling: Part 2

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Structural characterizations of schema mapping languages

RDF AND SPARQL. Part III: Semantics of RDF(S) Dresden, August Sebastian Rudolph ICCL Summer School

A Relaxed Approach to RDF Querying

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Flexible Tools for the Semantic Web

Efficient query answering in the presence of DL-LiteR constraints

Semantics. KR4SW Winter 2011 Pascal Hitzler 1

Schema-Agnostic Query Rewriting in SPARQL 1.1

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Today: RDF syntax. + conjunctive queries for OWL. KR4SW Winter 2010 Pascal Hitzler 3

RDF Schema. Mario Arrigoni Neri

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

LECTURE 09 RDF: SCHEMA - AN INTRODUCTION

Forward Chaining Reasoning Tool for Rya

Mastro Studio: a system for Ontology-Based Data Management

Knowledge Representation for the Semantic Web

Querying Data through Ontologies

Labeled graph homomorphism and first order logic inference

OWL 2 Profiles. An Introduction to Lightweight Ontology Languages. Markus Krötzsch University of Oxford. Reasoning Web 2012

Formalising the Semantic Web. (These slides have been written by Axel Polleres, WU Vienna)

Unit 2 RDF Formal Semantics in Detail

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Main topics: Presenter: Introduction to OWL Protégé, an ontology editor OWL 2 Semantic reasoner Summary TDT OWL

Semantic Technologies & Triplestores for BI

Practical Aspects of Query Rewriting for OWL 2

Genea: Schema-Aware Mapping of Ontologies into Relational Databases

Semantic Web Test

Foundations of SPARQL Query Optimization

OWL 2 Profiles. An Introduction to Lightweight Ontology Languages. Маркус Крёч (Markus Krötzsch) University of Oxford. KESW Summer School 2012

Graph Data Management & The Semantic Web

Developing markup metaschemas to support interoperation among resources with different markup schemas

Outline RDF. RDF Schema (RDFS) RDF Storing. Semantic Web and Metadata What is RDF and what is not? Why use RDF? RDF Elements

Handling time in RDF

Data Exchange in the Relational and RDF Worlds

XML Perspectives on RDF Querying: Towards integrated Access to Data and Metadata on the Web

A General Approach to Query the Web of Data

Ontological Modeling: Part 11

Multi-agent and Semantic Web Systems: RDF Data Structures

Making BioPAX SPARQL

Distributed RDFS Reasoning Over Structured Overlay Networks

Lecture 1: Conjunctive Queries

Table of Contents. iii

View Selection in Semantic Web Databases

OWL 2 The Next Generation. Ian Horrocks Information Systems Group Oxford University Computing Laboratory

Reasoning on Web Data Semantics

Semantic Web Ontologies

Semantic Web and Linked Data

Semantic Web. RDF and RDF Schema. Morteza Amini. Sharif University of Technology Spring 90-91

Adding formal semantics to the Web

Reasoning with the Web Ontology Language (OWL)

Scaling the Semantic Wall with AllegroGraph and TopBraid Composer. A Joint Webinar by TopQuadrant and Franz

Semantic Web Knowledge Representation in the Web Context. CS 431 March 24, 2008 Carl Lagoze Cornell University

New Approach to Graph Databases

RDF Analytics: Lenses over Semantic Graphs

Linked Data and RDF. COMP60421 Sean Bechhofer

On Ordering and Indexing Metadata for the Semantic Web

OWL DL / Full Compatability

X-KIF New Knowledge Modeling Language

ORM and Description Logic. Dr. Mustafa Jarrar. STARLab, Vrije Universiteit Brussel, Introduction (Why this tutorial)

Mustafa Jarrar: Lecture Notes on RDF Schema Birzeit University, Version 3. RDFS RDF Schema. Mustafa Jarrar. Birzeit University

H1 Spring C. A service-oriented architecture is frequently deployed in practice without a service registry

Bryan Smith May 2010

Expressive Querying of Semantic Databases with Incremental Query Rewriting

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Rewriting Ontology-Mediated Queries. Carsten Lutz University of Bremen

On the Hardness of Counting the Solutions of SPARQL Queries

Linked Data and RDF. COMP60421 Sean Bechhofer

Unit 1 a Bird s Eye View on RDF(S), OWL & SPARQL

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Ontologies and Databases

RDF Semantics A graph-based approach

Deep integration of Python with Semantic Web technologies

Approach for Mapping Ontologies to Relational Databases

Today s Plan. INF3580 Semantic Technologies Spring Model-theoretic semantics, a quick recap. Outline

Handling Inconsistencies due to Class Disjointness in SPARQL Updates. joint work with: Albin Ahmeti, Diego Calvanese, Vadim Savenkov.

A Unified Logical Framework for Rules (and Queries) with Ontologies - position paper -

Description Logic: A Formal Foundation for Ontology Languages and Tools

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

An Introduction to the Semantic Web. Jeff Heflin Lehigh University

Logical reconstruction of RDF and ontology languages

RDF Semantics by Patrick Hayes W3C Recommendation

CS Knowledge Representation and Reasoning (for the Semantic Web)

DL-Media: An Ontology Mediated Multimedia Information Retrieval System

Mapping Relational Data to RDF with Virtuoso's RDF Views

Representing Product Designs Using a Description Graph Extension to OWL 2

Transcription:

Toward Analytics for RDF Graphs Ioana Manolescu INRIA and Ecole Polytechnique, France ioana.manolescu@inria.fr http://pages.saclay.inria.fr/ioana.manolescu Joint work with D. Bursztyn, S. Cebiric (Inria), F. Goasdoué (U. Rennes 1) University of Copenhagen, August 2017 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 1 / 54

Outline 1 Background: semantic RDF graphs 2 Efficient RDF query answering in the presence of ontologies [EDBT2015, VLDB 2016] 3 Summarizing semantic-rich RDF graphs [VLDB 2015, TR2017] Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 2 / 54

Part I Background: RDF graphs Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 3 / 54

RDF Big Data needs semantics AI Magazine, Spring 2015 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 4 / 54

RDF Do we really need the semantics? Yes. All the time. Application knowledge / constraints: Every Senator is an ElectedOfficial which is a Person (On Wikipedia) being BornInAPlace means being a Person Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 5 / 54

RDF Do we really need the semantics? Yes. All the time. Application knowledge / constraints: Every Senator is an ElectedOfficial which is a Person (On Wikipedia) being BornInAPlace means being a Person Without the semantics, we may miss query answers Data Constraints Query John is a Senator Every Senator is a Person Who is a Person? Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 5 / 54

RDF Do we really need the semantics? Yes. All the time. Application knowledge / constraints: Every Senator is an ElectedOfficial which is a Person (On Wikipedia) being BornInAPlace means being a Person Semantic contraints are a compact way of encoding information Every ElectedOfficial is a Person stated only once even if thousands of ElectedOfficials. Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 6 / 54

RDF Semantics for Web data Data and metadata on the Web is often structured in graphs, e.g., RDF (W3C s Resource Description Framework) Famous application: the Linked Open Data cloud (2017) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 7 / 54

RDF The Resource Description Framework (RDF) RDF graph: set of triples Assertion Triple Relational notation Intuition Class s rdf:type o o(s) s is an o Property s p o p(s, o) The p of s is o Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 8 / 54

RDF The Resource Description Framework (RDF) RDF graph: set of triples Assertion Triple Relational notation Intuition Class s rdf:type o o(s) s is an o Property s p o p(s, o) The p of s is o resource (URI) blank node literal (string) property 1949 Book publishedin rdf:type writtenby doi 1 hastitle El Aleph :b 1 hasname J. L. Borges Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 8 / 54

RDF The Resource Description Framework (RDF) Assertion Triple Relational notation Intuition Class s rdf:type o o(s) s is an o Property s p o p(s, o) The p of s is o Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 9 / 54

RDF The Resource Description Framework (RDF) Assertion Triple Relational notation Intuition Class s rdf:type o o(s) s is an o Property s p o p(s, o) The p of s is o resource (URI) blank node literal (string) property 1949 Book publishedin rdf:type writtenby doi 1 hastitle El Aleph :b 1 hasname J. L. Borges Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 9 / 54

RDF RDFS RDF Schema (RDFS) Declare deductive constraints between classes and properties Constraint Triple OWA interpretation Subclass c 1 rdfs:subclassof c 2 c 1 c 2 Subproperty p 1 rdfs:subpropertyof p 2 p 1 p 2 Domain typing p rdfs:domain c Π domain (p) c Range typing p rdfs:range c Π range (p) c Publication rdfs:subclassof Book hasauthor rdfs:subpropertyof rdfs:domain writtenby rdfs:range Any c 1 is also a c 2 Person Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 10 / 54

RDF RDFS RDF Schema (RDFS) Declare deductive constraints between classes and properties Constraint Triple OWA interpretation Subclass c 1 rdfs:subclassof c 2 c 1 c 2 Subproperty p 1 rdfs:subpropertyof p 2 p 1 p 2 Domain typing p rdfs:domain c Π domain (p) c Range typing p rdfs:range c Π range (p) c Publication rdfs:subclassof Book rdfs:domain writtenby hasauthor rdfs:subpropertyof rdfs:range Person If two resources are related by p 1, they are also related by p 2 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 11 / 54

RDF RDFS RDF Schema (RDFS) Declare deductive constraints between classes and properties Constraint Triple OWA interpretation Subclass c 1 rdfs:subclassof c 2 c 1 c 2 Subproperty p 1 rdfs:subpropertyof p 2 p 1 p 2 Domain typing p rdfs:domain c Π domain (p) c Range typing p rdfs:range c Π range (p) c Publication rdfs:subclassof Book hasauthor rdfs:subpropertyof rdfs:domain writtenby rdfs:range Anyone having p is a c Person Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 12 / 54

RDF RDFS RDF Schema (RDFS) Declare deductive constraints between classes and properties Constraint Triple OWA interpretation Subclass c 1 rdfs:subclassof c 2 c 1 c 2 Subproperty p 1 rdfs:subpropertyof p 2 p 1 p 2 Domain typing p rdfs:domain c Π domain (p) c Range typing p rdfs:range c Π range (p) c Publication rdfs:subclassof Book hasauthor rdfs:subpropertyof rdfs:domain writtenby rdfs:range Anyone who is a value of p is a c Person Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 13 / 54

RDF RDF entailment Open-world assumption and RDF entailment RDF data model based on the open-world assumption. Deductive constraints lead to implicit triples: part of the graph even though not explicitly present Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 14 / 54

RDF RDF entailment Open-world assumption and RDF entailment RDF data model based on the open-world assumption. Deductive constraints lead to implicit triples: part of the graph even though not explicitly present explicit triples + implicit triples entailment rules Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 14 / 54

RDF RDF entailment Open-world assumption and RDF entailment RDF data model based on the open-world assumption. Deductive constraints lead to implicit triples: part of the graph even though not explicitly present explicit triples + implicit triples entailment rules Exhaustive application of entailment leads to saturation (closure) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 14 / 54

RDF RDF entailment The semantics of an RDF graph G is its saturation G Sample instance entailment rules from schema and instance triples c 1 rdfs:subclassof c 2 s rdf:type c 1 RDF s rdf:type c 2 p 1 rdfs:subpropertyof p 2 s p 1 o RDF s p 2 o p rdfs:domain c s p o RDF s rdf:type c p rdfs:range c s p o RDF o rdf:type c Publication rdfs:domain hasauthor rdfs:subclassof rdfs:subpropertyof 1949 rdf:type Book rdfs:domain writtenby publishedin rdf:type rdfs:range doi 1 writtenby hasauthor :b 1 rdf:type Person hastitle hasname El Aleph J. L. Borges Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 15 / 54

Part II Efficient query answering Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 16 / 54

Query answering techniques Query answering through a relational database management system (RDBMS) Query answering query evaluation! Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 17 / 54

Query answering techniques Query answering through a relational database management system (RDBMS) Advantage of using RDBMS Query answering query evaluation! Highly efficient and scalable for query evaluation on the data stored in the database Limitation Still mostly unaware of semantics (except: integrity constraints) Two main methods Saturation-based query answering Reformulation-based query answering Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 17 / 54

Query answering techniques Saturation-based query answering RDF Inference Rules G answer G query q q(g ) can be computed using an RDBMS G needs time to be computed and space to be stored Not suitable for high update rate (data and/or schema triples) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 18 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G answer query q query q ref Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 19 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G answer query q query q ref q ref (G) can be evaluated in the RDBMS Robust to updates Reformulated queries are complex, thus costly to evaluate Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 19 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G answer query q query q ref Target reformulation languages for RDF conjunctive queries (CQs): Unions of CQs (UCQs) Data Constraints Query John is a Senator Every Senator is a Person Who is a Person? Reformulated query: Who is a Person or a Senator? Reformulation introduces unions Each union term is an alternative reason or proof for a result Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 20 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G answer query q Target language for q ref : Unions of CQs (UCQs) Query (CQ) Reformulated query (UCQ) query q ref Who are the Professors and what are their Publications? (Professors with their JournalPapers) or (Professors with their ConferencePapers) or (AssistantProfessors with their JournalPapers) or (AssistantProfessors with their ConferencePapers) or (FullProfessors with their ConferencePapers) or (FullProfessors with their JournalPapers) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 21 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G query q Target language for q ref : query q ref answer semi-conjunctive queries (joins of unions of atoms) (SCQs) Query (CQ) Who are the Professors and what are their Publications? Reformulated query (SCQ) (Professors or AssistantProfessors or FullProfessors) with their (JournalPapers or ConferencePapers) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 22 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G query q Target language for q ref : query q ref answer Unions of CQs (UCQs) F. Goasdoué, I. Manolescu, A. Roatiş: Efficient query answering against dynamic RDF databases, EDBT 2013 Joins of single-atom UCQs (SCQs) M. Thomazo: Compact Rewritings for Existential Rules, IJCAI 2013 Joins of UCQs (JUCQs) D. Bursztyn, F. Goasdoué, I. Manolescu: Optimizing Reformulation-based Query Answering in RDF, EDBT 2015 D. Bursztyn, F. Goasdoué, I. Manolescu: Teaching an RDBMS about ontological constraints, PVLDB 2016 (for DL-Lite R ontologies) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 23 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G query q Target language for q ref : query q ref answer Unions of CQs (UCQs) F. Goasdoué, I. Manolescu, A. Roatiş: Efficient query answering against dynamic RDF databases, EDBT 2013 Wait: does SQL syntax mater?!... Joins of single-atom UCQs (SCQs) M. Thomazo: Compact Rewritings for Existential Rules, IJCAI 2013 Joins of UCQs (JUCQs) D. Bursztyn, F. Goasdoué, I. Manolescu: Optimizing Reformulation-based Query Answering in RDF, EDBT 2015 D. Bursztyn, F. Goasdoué, I. Manolescu: Teaching an RDBMS about ontological constraints, PVLDB 2016 (for DL-Lite R ontologies) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 23 / 54

Query answering techniques Query reformulation Reformulation-based query answering RDF Inference Rules G query q Target language for q ref : query q ref answer Unions of CQs (UCQs) F. Goasdoué, I. Manolescu, A. Roatiş: Efficient query answering against dynamic RDF databases, EDBT 2013 Wait: does SQL syntax mater?!... Joins of single-atom UCQs (SCQs) M. Thomazo: Compact Rewritings for Existential Rules, IJCAI 2013 Yes. A lot. Joins of UCQs (JUCQs) D. Bursztyn, F. Goasdoué, I. Manolescu: Optimizing Reformulation-based Query Answering in RDF, EDBT 2015 D. Bursztyn, F. Goasdoué, I. Manolescu: Teaching an RDBMS about ontological constraints, PVLDB 2016 (for DL-Lite R ontologies) From failing to feasible, e.g., 4 orders of magnitude speed-up on 8M triples of DBLP data. Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 23 / 54

Query answering techniques Query reformulation Reformulation into a join of UCQs (JUCQ) RDF Entailment Rules G answer query q Optimiser query q Reformulation algorithm 1. Considers a set of reformulation alternatives (not a fixed one) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 24 / 54

Query answering techniques Query reformulation Reformulation into a join of UCQs (JUCQ) RDF Entailment Rules G answer query q Optimiser query q Reformulation algorithm 1. Considers a set of reformulation alternatives (not a fixed one) 2. Uses a cost model for estimating the cost of evaluating q ref through an RDBMS Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 24 / 54

Query answering techniques Query reformulation Reformulation into a join of UCQs (JUCQ) RDF Entailment Rules G answer query q Optimiser query q Reformulation algorithm 1. Considers a set of reformulation alternatives (not a fixed one) 2. Uses a cost model for estimating the cost of evaluating q ref through an RDBMS 3. Chooses the cheapest alternative. Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 24 / 54

Query answering techniques Query reformulation Optimized reformulation into JUCQs Query q CQ-to-UCQ ref. algo. Graph G Results q 1 c(q 1 ) c(q ref ) q ref...... q best c(q best ) q n c(q n ) UCQ ref JUCQ ref q ref q best RDBMS Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 25 / 54

Query answering techniques Cover-based reformulation for RDFS Cover-based reformulation Idea 1 Split the query into a cover of potentially overlapping query fragments 2 Each fragment determines a (conjunctive) sub-query 3 Reformulate each fragment into a UCQ 4 Join the reformulations. This gets translated into an SQL query of the form: WITH... as refq1,..., as refq2... SELECT... FROM refq1, refq1,... WHERE... Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 26 / 54

Query answering techniques Cover-based reformulation for RDFS CQ-to-JUCQ query reformulation performance Given the query q 1 (x, y) :- x rdf:type y, (t 1 ) x ub:degreefrom http : //www.university532.edu, (t 2 ) x ub:memberof http : //www.department1.university7.edu (t 3 ) on 100M LUBM data: JUCQ #reformulations exec. time (ms) (t 1, t 2, t 3 ) ref 2, 256 6,387 (t 1 ) ref (t 2 ) ref (t 3 ) ref 195 1,074,026 (t 1, t 2 ) ref (t 3 ) ref 755 1, 968 (t 1 ) ref (t 2, t 3 ) ref 200 846, 710 (t 1, t 3 ) ref (t 2 ) ref 568 554 (t 1, t 2 ) ref (t 1, t 3 ) ref 1, 316 2, 734 (t 1, t 2 ) ref (t 2, t 3 ) ref 764 2, 289 (t 1, t 3 ) ref (t 2, t 3 ) ref 576 588 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 27 / 54

Query answering techniques Cover-based reformulation for RDFS Experiments LUBM 100 M data PostgreSQL 9.3.2 System A System B 1 UCQ reformulation 2 SCQ reformulation 3 Exhaustive JUCQ reformulation 4 Greedy JUCQ reformulation (starting from SCQ, break into fragments) Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 28 / 54

Query answering techniques Cover-based reformulation for RDFS Query answering using PostgreSQL (RDFS ontology) LUBM 100M; 28 queries; 2 to 6 atoms; 1 to 318, 089 union terms Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 29 / 54

Query answering techniques Cover-based reformulation for RDFS Query answering using System A (RDFS ontology) LUBM 100 M; 28 queries; 2 to 6 atoms; 1 to 318, 089 union terms Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 30 / 54

Query answering techniques Cover-based reformulation for RDFS Query answering using System B (RDFS ontology) LUBM 100M; 28 queries; 2 to 6 atoms; 1 to 318, 089 union terms Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 31 / 54

Query answering techniques Cover-based reformulation for RDFS Query answering using PostgreSQL (DL-Lite R ontology) LUBM 20 benchmark, 100 M Some reformulations under RDFS are lossy under DL-Lite R 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 UCQ Croot GDL/RDBMS GDL / ext 1.E+00 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 32 / 54

Query answering techniques Cover-based reformulation for RDFS Query answering using DB2 (DL-Lite R ontology) LUBM 20 benchmark, 100 M Simple data layout vs. RDF-specific one M. Bornea, J. Dolby, A. Kementsietsidis et al., Building an efficient RDF store over a Relational Database, SIGMOD 2013 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 UCQ / Simple layout UCQ / RDF layout Croot / Simple layout Croot / RDF layout GDL / Simple layout / RDBMS GDL / Simple layout / ext Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 33 / 54

Query answering techniques Cover-based reformulation for RDFS Why does DB2/RDF perform so poorly? This query: Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 34 / 54

Query answering techniques Cover-based reformulation for RDFS Why does DB2/RDF perform so poorly?...becomes after reformulation on DB2 s RDF-specific store: Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 35 / 54

Query answering techniques Cover-based reformulation for RDFS RDF query answering through optimized reformulation 1 Equivalent SQL syntaxes are not equal from the RDBMS optimizer perspective (inside or outside well-supported dialect) 2 Chosing the reformulation based on cost model makes queries feasible or efficient when they were not RDBMS optimizers have been tuned for conjunctive queries (historical use-case) The cost-based optimizer sees one CQ at a time! 3 This amounts to enlarging the optimizer s well-supported dialect, at a very modest performance overhead. Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 36 / 54

Part III RDF summarization Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 37 / 54

Concepts RDF summaries Problem RDF graph G is large, heterogeneous, partially implicit. How to compactly represent all its structure? Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 38 / 54

Concepts Summary of DBLP data 150M triples Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 39 / 54

Concepts Summary of geographic data French territory division in regions, departments, urban areas, cities, districts etc. 368K triples Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 40 / 54

Concepts RDF summaries Solution 1 Define RDF node equivalence relation : equivalence relation such that class and property nodes are only equivalent to themselves 2 Define RDF summary G / of an RDF graph G as the quotient of G through Recall: quotient of a directed graph G by G = (V, E), equivalence relation on V G / nodes: one for equivalence class of V G / edges: n 1 a n 2 iff n a 1 n 2 G such that n 1 represented by n/, 1 n 2 represented by n 2 / Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 41 / 54

Properties Formal summary properties For any RDF equivalence relation : Size limit The summary is at most as large as the graph. Schema preservation The schema of G / is the schema of G. Representativeness Any conjunctive query q with answers on G also has answers on its summary: q(g ) q((g / ) ) This enables query pruning (for query answering) without saturating G Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 42 / 54

Properties Which equivalence relations to use? Equivalence notions previously studied Forward / backward / forward and backward simulation Forward / backward / forward and backward bisimulation Adapted to semantic RDF graphs Novel equivalence notions we introduce (see next) Flexible similarity suited to heterogeneous graphs Based on property cliques and possibly on RDF types Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 43 / 54

Clique-based summaries RDF node equivalence based on property cliques Intuition: a 1, a 2 are similar; r 1, r 2, r 3, r 4, r 5 are similar τ Book τ Journal τ r 6 r 1 r 2 r 3 author title title editor editor comment a 1 t 1 t 2 e 1 e 2 c 1 reviewed published editor r 4 r 5 author title a 2 t 3 t 4 title τ Spec Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 44 / 54

Clique-based summaries RDF node equivalence based on property cliques Output property cliques: {a, t, e, c}; {r}; {p}; Input property cliques: {a}; {t}; {e}; {c}; {r, p}; τ Book τ Journal τ r 6 r 1 r 2 r 3 author title title editor editor comment a 1 t 1 t 2 e 1 e 2 c 1 reviewed published editor r 4 r 5 author title a 2 t 3 t 4 title τ Spec Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 45 / 54

Clique-based summaries Weak clique-based summaries Two nodes are weakly equivalent ( W ) iff they have the same input clique or the same output clique. Weak summary G / W of the sample RDF graph G: Book Journal Spec τ τ τ τ r a t c e p Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 46 / 54

Clique-based summaries Strong clique-based summaries Two nodes are strongly equivalent ( S ) iff they have the same input clique and the same output clique. Strong summary G / S of the sample RDF graph G: τ Book Journal Spec τ τ τ a t t a e c e r p Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 47 / 54

Clique-based summaries On this example, this is also the typed strong summary G / TS. Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 48 / 54 Using types for summarization Group nodes first by their types; then group untyped nodes by their property cliques. Typed weak summary G / TW of the sample RDF graph G: Book τ Journal τ author title title editor editor comment reviewed author titlepublished title editor τ Spec

Summary sizes Summary sizes Graph G G Summary G / G / cf DBLP 150,787,464 G /W 71 2,123,767 DBLP 150,787,464 G /S 206 731,978 DBLP 150,787,464 G /fw 262,695 574 LUBM1M 1,227,868 G /W 161 7,579 LUBM1M 1,227,868 G /S 207 5,903 LUBM1M 1,227,868 G /fw 1982 617 LUBM10M 11,990,183 G /W 162 74,013 LUBM10M 11,990,183 G /S 206 58,204 LUBM10M 11,990,183 G /fw 24,958 480 LUBM10M 11,990,183 G /bw 6,162 1,944 LUBM10M 11,990,183 G /fb 11,990,076 1 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 49 / 54

Summary sizes Summarizing G Can we summarize G without saturating G? Shortcut theorem For the summaries G /W, G /S, G /fw, G /bw, G /fb : (G ) / strongly isomorphic to ((G / ) ) / (isomorphic, and identical class and property nodes) Also: sufficient condition for any to admit the shortcut. Shortcut may accelerate summarization of G by up to 20 Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 50 / 54

Summary sizes Shortcut example G r 1 a b 1 y 1 y 2 r 2 b 2 c z b 1 sp b, b 2 sp b G r 1 b b a b 1 x x y 1 y 2 r b 2 2c z b 1 sp b b 2 sp b G /W a b 1 b 2 c b 1 sp b b 2 sp b a b 1 b b 2 c (G /W ) a b 1 b b b 2 c (G ) /W = ((G /W ) ) /W b 1 sp b, b 2 sp b b 1 sp b b 2 sp b Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 51 / 54

Summary sizes Shortcut counter-example G r 1 r 2 a d a b b c x y 1 y 2 G /TW a b a d c (G /TW ) = ((G /TW ) ) /TW c τ a d a b c G c τ r 1 a b x y 1 y 2 r 2 b a d c (G ) /TW c τ a b b a d c Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 52 / 54

Part IV Conclusion Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 53 / 54

RDF graphs with semantics Semantic rules lead to implicit data Reformulated queries can be very complex (, ) Two-stage processing: let the RDBMS handle (just) join fragments Ontology-aware (as opposed to query rewrites ) Quotient-based summaries represent explicit and implicit graph structure; shortcut for efficiently building (G ) /. Clique-based summaries much more compact than (bi)simulation-based Ioana Manolescu Toward RDF Analytics U. of Copenhagen, Aug 2017 54 / 54