COMP718: Ontologies and Knowledge Bases

Similar documents
MASTRO-I: Efficient integration of relational data through DL ontologies

Maurizio Lenzerini. Dipartimento di Ingegneria Informatica Automatica e Gestionale Antonio Ruberti

Mastro Studio: a system for Ontology-Based Data Management

Ontology-based database access

Linking Data to Ontologies: The Description Logic DL-Lite A

Semantic Web Technologies

Linking Data to Ontologies

Optimizing Query Rewriting in Ontology-Based Data Access

Ontologies and Databases

Managing Datatypes in Ontology-Based Data Access

Data Integration: A Theoretical Perspective

OWL 2 The Next Generation. Ian Horrocks Information Systems Group Oxford University Computing Laboratory

COMP718: Ontologies and Knowledge Bases

Query Answering and Rewriting in Ontology-based Data Access. Riccardo Rosati DIAG, Sapienza Università di Roma

Practical Aspects of Query Rewriting for OWL 2

Database Optimization Techniques for Semantic Queries

Data integration lecture 2

Verification of Data-Aware Processes Data Centric Dynamic Systems

DL-Lite: Tractable Description Logics for Ontologies

Scalable Ontology-Based Information Systems

Efficiently Managing Data Intensive Ontologies

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Local Closed World Reasoning with OWL 2

THE RELATIONAL MODEL. University of Waterloo

When OWL met DL-Lite...

Introducing Datatypes in DL-Lite

Ontology-based Data Management. Maurizio Lenzerini

Mastro: A Reasoner for Effective Ontology-Based Data Access

Towards a Logical Reconstruction of Relational Database Theory

Fundamentals of Physical Design: State of Art

Logical reconstruction of RDF and ontology languages

Toward Analytics for RDF Graphs

Lecture 1: Conjunctive Queries

Data Integration: A Logic-Based Perspective

OWL 2 Profiles. An Introduction to Lightweight Ontology Languages. Markus Krötzsch University of Oxford. Reasoning Web 2012

Description Logics and OWL

Semantic data integration in P2P systems

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza

New Approach to Graph Databases

ITCS 3160 DATA BASE DESIGN AND IMPLEMENTATION

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Ontologies and OWL. Riccardo Rosati. Knowledge Representation and Semantic Technologies

On Reconciling Data Exchange, Data Integration, and Peer Data Management

A Unified Logical Framework for Rules (and Queries) with Ontologies - position paper -

Updating data and knowledge bases

Leveraging the Expressivity of Grounded Conjunctive Query Languages

The Basic (Flat) Relational Model. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Knowledge Representation and Ontologies Part 1: Modeling Information through Ontologies

Databases - Relations in Databases. (N Spadaccini 2010) Relations in Databases 1 / 16

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Presented By Aditya R Joshi Neha Purohit

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering

Structural characterizations of schema mapping languages

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

ARISTOTLE UNIVERSITY OF THESSALONIKI. Department of Computer Science. Technical Report

The Logic of the Semantic Web. Enrico Franconi Free University of Bozen-Bolzano, Italy

Semantic reasoning for dynamic knowledge bases. Lionel Médini M2IA Knowledge Dynamics 2018

Data Integration: Logic Query Languages

Description Logics as Ontology Languages for Semantic Webs

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

THE RELATIONAL DATA MODEL CHAPTER 3 (6/E) CHAPTER 5 (5/E)

Appendix 1. Description Logic Terminology

Appendix 1. Description Logic Terminology

DL-Media: An Ontology Mediated Multimedia Information Retrieval System

X-KIF New Knowledge Modeling Language

Uncertain Data Models

Description Logic: A Formal Foundation for Ontology Languages and Tools

Simplified Approach for Representing Part-Whole Relations in OWL-DL Ontologies

Representing Product Designs Using a Description Graph Extension to OWL 2

Evaluation of Query Rewriting Approaches for OWL 2

Enhanced Entity-Relationship (EER) Modeling

RDF Data Management: Reasoning on Web Data

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

The onprom Toolchain for Extracting Business Process Logs using Ontology-based Data Access

Modified Query-Roles Based Access Control Model (Q-RBAC) for Interactive Access of Ontology Data

Conceptual Design with ER Model

Analyzing Real-World SPARQL Queries in the Light of Probabilistic Data

Rewrite and Conquer: Dealing with Integrity Constraints in Data Integration

Modularity in Ontologies: Introduction (Part A)

Query Evaluation of Tractable Answering using Query Rewriting

Ontology-Based Data Access

Modeling multidimensional database

Knowledge Engineering with Semantic Web Technologies

OWL and tractability. Based on slides from Ian Horrocks and Franz Baader. Combining the strengths of UMIST and The Victoria University of Manchester

A Tool for Storing OWL Using Database Technology

RELATIONAL REPRESENTATION OF ALN KNOWLEDGE BASES

A Framework for Ontology Integration

Efficient query answering in the presence of DL-LiteR constraints

Semantic Web. MPRI : Web Data Management. Antoine Amarilli Friday, January 11th 1/29

SMO System Management Ontology

Enabling Statoil FactPages with the Optique platform

MASTRO at Work: Experiences on Ontology-based Data Access

Semantic Enrichment of GSM-Based Artifact-Centric Models

On the Hardness of Counting the Solutions of SPARQL Queries

Institute of Automatics AGH University of Science and Technology, POLAND. Hybrid Knowledge Engineering.

Data Complexity of Query Answering in Description Logics

The Relational Model

3. Relational Data Model 3.5 The Tuple Relational Calculus

Data Integration A Logic-Based Perspective

Transcription:

1/35 COMP718: Ontologies and Knowledge Bases Lecture 9: Ontology/Conceptual Model based Data Access Maria Keet email: keet@ukzn.ac.za home: http://www.meteck.org School of Mathematics, Statistics, and Computer Science University of KwaZulu-Natal, South Africa 10 April 2012

2/35 Outline 1 OBDA Options 2 Some technical details Introduction The ontology language The mapping layer Impedance mismatch Mapping assertions Query answering

3/35 An ontology with a very large ABox (intro last week) Scaling up to realistic size knowledge base handling large amounts of data To realise this, we need A language of relatively low computational complexity A way to store large amounts of data Some mechanism to link up the previous two ingredients Query (and reason over) the combination of the previous three Use the Ontology-Based Data Access (OBDA) approach with the ontology in OBDA just a DL knowledge base Most examples and use cases: the ontology is a DL-formalised conceptual data model Example application with the WONDER system

4/35 An ontology with a very large ABox (this week) What are the options to link an ontology to large amounts of data? Two principal options (in KR view): query rewriting and data completion Several implementation infrastructures; external ABox most popular (realised with RDBMS or RDF Triple store) What is there behind the scenes for the non-graphical OBDA-part in WONDER and the OBDA systems you set up in the lab?

5/35 Outline 1 OBDA Options 2 Some technical details Introduction The ontology language The mapping layer Impedance mismatch Mapping assertions Query answering

6/35 OBDA options KR perspective (with OWA): query rewriting vs data completion DB perspective (with CWA): we probably won t cover this in the lecture See slides obda-slides2012tomancomp718ukzn.pdf

7/35 Outline 1 OBDA Options 2 Some technical details Introduction The ontology language The mapping layer Impedance mismatch Mapping assertions Query answering

8/35 Introduction Linking ontologies to relational data 1 Ontology-Based Data Access systems (static components) An ontology language A mapping language The data Query answering in Ontology-Based Data Access systems Reasoning over the TBox Query rewriting Query unfolding Relational database technology These slides are based on Calvanese s MOSS 09 slides, which also will be made available 1 More precisely: Option I, v1.0 mentioned in David Toman s slides.

9/35 Introduction An OBDA system Definition (Ontology-Based Data Access system) An OBDA system is a triple O = T, M, D, where T is a TBox D is a relational database M is a set of mapping assertions between T and D Note: this is for the current system, but one could conceive of a system that has an RDF triple store as D

10/35 Introduction D as ABox In the traditional DL setting, it is assumed that the data is maintained in the ABox of the ontology, meaning: The ABox is perfectly compatible with the TBox: The vocabulary of concepts, roles, and attributes is the one used in the TBox The ABox stores abstract objects, and these objects and their properties are those returned by queries over the ontology Other ways to manage the ABox from an implementation point of view: Description Logics reasoners maintain the ABox is main-memory data structures (recollect the 4 GB HGT-DB) Hence, when an ABox becomes large, managing it in secondary storage may be required, but this is again handled directly by the reasoner

10/35 Introduction D as ABox In the traditional DL setting, it is assumed that the data is maintained in the ABox of the ontology, meaning: The ABox is perfectly compatible with the TBox: The vocabulary of concepts, roles, and attributes is the one used in the TBox The ABox stores abstract objects, and these objects and their properties are those returned by queries over the ontology Other ways to manage the ABox from an implementation point of view: Description Logics reasoners maintain the ABox is main-memory data structures (recollect the 4 GB HGT-DB) Hence, when an ABox becomes large, managing it in secondary storage may be required, but this is again handled directly by the reasoner

11/35 Introduction D = Relational database as ABox In addition to ABox scalability, there are other reasons to realise the ABox with D: When we have no direct control over the data since it belongs to some external organization, which controls the access to it When multiple data sources need to be accessed, such as in Information Integration Deal with such a situation by keeping the data in the external (relational) storage, and performing query answering by leveraging the capabilities of the relational engine New problems: The so-called impedance mismatch between values in the relational database and the objects that the ABox expects How to link the TBox to the ABox that is realised as a D?

11/35 Introduction D = Relational database as ABox In addition to ABox scalability, there are other reasons to realise the ABox with D: When we have no direct control over the data since it belongs to some external organization, which controls the access to it When multiple data sources need to be accessed, such as in Information Integration Deal with such a situation by keeping the data in the external (relational) storage, and performing query answering by leveraging the capabilities of the relational engine New problems: The so-called impedance mismatch between values in the relational database and the objects that the ABox expects How to link the TBox to the ABox that is realised as a D?

11/35 Introduction D = Relational database as ABox In addition to ABox scalability, there are other reasons to realise the ABox with D: When we have no direct control over the data since it belongs to some external organization, which controls the access to it When multiple data sources need to be accessed, such as in Information Integration Deal with such a situation by keeping the data in the external (relational) storage, and performing query answering by leveraging the capabilities of the relational engine New problems: The so-called impedance mismatch between values in the relational database and the objects that the ABox expects How to link the TBox to the ABox that is realised as a D?

The ontology language The DL-Lite family A family of DLs optimized according to the tradeoff between expressive power and complexity of query answering, with emphasis on data Carefully designed to have nice computational properties for answering UCQs (i.e., computing certain answers): The same complexity as relational databases Query answering can be delegated to a relational DB engine The DLs of the DL-Lite family are essentially the maximally expressive ontology languages enjoying these nice computational properties Introduction of DL-Lite R, a member of the DL-Lite family, essentially corresponds to OWL2 QL 2 2 Actually, the current OBDA implementation can handle DL-LiteA, and all DL-Lite languages adhere to the UNA 12/35

13/35 The ontology language DL-Lite R (compacter DL notation of OWL 2 QL)

14/35 The ontology language DL-Lite R (compacter DL notation of OWL 2 QL)

15/35 The ontology language DL-Lite R (compacter DL notation of OWL 2 QL)

16/35 The mapping layer Relational database as ABox Sources store data, which is constituted by values taken from concrete domains, such as strings, integers, codes,... Instances of concepts and relations in an ontology are (abstract) objects Solution: Specify how to construct from the data values in the relational sources the (abstract) objects that populate the ABox of the ontology Embed this specification in the mappings between the data sources and the ontology Use a virtual ABox, where the objects are not materialized

16/35 The mapping layer Relational database as ABox Sources store data, which is constituted by values taken from concrete domains, such as strings, integers, codes,... Instances of concepts and relations in an ontology are (abstract) objects Solution: Specify how to construct from the data values in the relational sources the (abstract) objects that populate the ABox of the ontology Embed this specification in the mappings between the data sources and the ontology Use a virtual ABox, where the objects are not materialized

16/35 The mapping layer Relational database as ABox Sources store data, which is constituted by values taken from concrete domains, such as strings, integers, codes,... Instances of concepts and relations in an ontology are (abstract) objects Solution: Specify how to construct from the data values in the relational sources the (abstract) objects that populate the ABox of the ontology Embed this specification in the mappings between the data sources and the ontology Use a virtual ABox, where the objects are not materialized

17/35 The mapping layer Solution to the impedance mismatch Define a mapping language that allows for specifying how to transform data into abstract objects, where Each mapping assertion maps a query that retrieves values from a data source to a set of atoms specified over the ontology Basic idea: use Skolem functions in the atoms over the ontology to generate the objects from the data values Semantics of mappings: Objects are denoted by terms (of exactly one level of nesting) Different terms denote different objects (i.e., we make the unique name assumption on terms)

18/35 The mapping layer Example

19/35 The mapping layer Associate objects in the ontology to data in the tables Introduce an alphabet Λ of function symbols, each with an associated arity Use value constants from an alphabet Γ V to denote values Use object terms instead of object constants to denote objects: and object term has the form f (d 1,..., d n ) with f Λ, and each d i is a value constant in Γ V Example If a person is identified by her SSN, we can introduce a function symbol pers/1. If NRM18JUL18 is a SSN, then pers(nrm18jul18) denotes a person. If a person is identified by her name and dateofbirth, we can introduce a function symbol pers/2. Then pers(mandela, 18/07/18) denotes a person.

19/35 The mapping layer Associate objects in the ontology to data in the tables Introduce an alphabet Λ of function symbols, each with an associated arity Use value constants from an alphabet Γ V to denote values Use object terms instead of object constants to denote objects: and object term has the form f (d 1,..., d n ) with f Λ, and each d i is a value constant in Γ V Example If a person is identified by her SSN, we can introduce a function symbol pers/1. If NRM18JUL18 is a SSN, then pers(nrm18jul18) denotes a person. If a person is identified by her name and dateofbirth, we can introduce a function symbol pers/2. Then pers(mandela, 18/07/18) denotes a person.

20/35 The mapping layer Mapping assertions, formally Mapping assertions are used to extract the data from the DB to populate the ontology Use of variable terms, which are like object terms, but with variables instead of values as arguments of the functions Definition (Mapping assertion between a database and a TBox) A mapping assertion between a database D and a TBox T has the form Φ Ψ where Φ is an arbitrary SQL query of arity n > 0 over D; Ψ is a conjunctive query over T of arity n > 0 without non-distinguished variables, possibly involving variable terms.

20/35 The mapping layer Mapping assertions, formally Mapping assertions are used to extract the data from the DB to populate the ontology Use of variable terms, which are like object terms, but with variables instead of values as arguments of the functions Definition (Mapping assertion between a database and a TBox) A mapping assertion between a database D and a TBox T has the form Φ Ψ where Φ is an arbitrary SQL query of arity n > 0 over D; Ψ is a conjunctive query over T of arity n > 0 without non-distinguished variables, possibly involving variable terms.

21/35 The mapping layer Example

22/35 The mapping layer Mapping assertions in M Definition (Mapping assertion in M in an OBDA system) A mapping assertion between a database D and a TBox T in M has the form Φ( x) Ψ( t, y) where Φ is an arbitrary SQL query of arity n > 0 over D; Ψ is a conjunctive query over T of arity n > 0 without non-distinguished variables; x, y are variables with y x; t are variable terms of the form f ( z), with f Λ and z x.

23/35 The mapping layer Semantics of mappings Intuitively: I satisfies Φ Ψ with respect to D if all facts obtained by evaluating Φ over D and then propagating answers to Ψ, hold in I. Definition (Satisfaction of a mapping assertion with respect to a database) An interpretation I satisfies a mapping assertion Φ( x) Ψ( t, y) in M with respect to a database D, if for each tuple of values v Eval(Φ, D), and for each ground atom in Ψ[ x/ v], we have that: If the ground atom is A(s), then s I A I ; If the ground atom is P(s 1, s 2 ), then (s I 1, si 2 ) PI. Eval(Φ, D) denotes the result of evaluating Φ over D, Ψ[ x/ v] denotes Ψ where each x i is substituted with v i

24/35 The mapping layer Semantics of an OBDA system Definition (Model of an OBDA system) An interpretation I is a model of O = T, M, D if: I is a model of T ; I satisfies M with respect to D, i.e., every assertion in M w.r.t. D. An OBDA system O is satisfiable if it admits at least one model

24/35 The mapping layer Semantics of an OBDA system Definition (Model of an OBDA system) An interpretation I is a model of O = T, M, D if: I is a model of T ; I satisfies M with respect to D, i.e., every assertion in M w.r.t. D. An OBDA system O is satisfiable if it admits at least one model

25/35 Query answering Two approaches for query answering over O Bottom-up approach: Explicitly construct an ABox A M,D using D and M, and compute the certain answers over T, A M,D Conceptually simpler, but less efficient (PTime in the data). Top-down approach Unfold the query w.r.t. M and generate a query over D. Is more sophisticated, but also more efficient OBDA with QuOnto/Quest uses the top-down approach

26/35 Query answering Top-down approach to query answering, intuition

27/35 Query answering Top-down approach to query answering, intuition

28/35 Query answering Top-down approach to query answering Reformulation: compute the perfect reformulation (rewriting), q pr = PerfectRef (q, T P ), of the original query q using the inclusion assertions of the TBox T so that we have a UCQ. Unfolding: compute a new query q unf from q pr by using the (split version of) the mappings in M Each atom in q pr that unifies with an atom in Ψ is substituted with the corresponding query Φ over the database The unfolded query is such that Eval(q unf, D) = Eval(q pr, A M,D ) Evaluation: delegate the evaluation of q unf to the relational DBMS managing D More examples, rewriting rules and algorithm are described on pp290-297 of the MOSS 09 slides, and more details on unfolding are on pp248-251 of the MOSS 09 slides.

29/35 Query answering Example

30/35 Query answering Example

31/35 Query answering Example

32/35 Query answering Query unfolding, intuition

33/35 Query answering Example

34/35 Query answering Implementation of top-down approach to query answering To generate an SQL query, one can follow different strategies: Substitute each view predicate in the unfolded queries with the corresponding SQL query over the source: + joins are performed on the DB attributes + does not generate doubly nested queries the number of unfolded queries may be exponential Construct for each atom in the original query a new view. This view takes the union of all SQL queries corresponding to the view predicates, and constructs also the Skolem terms + avoids exponential blow-up of the resulting query, since the union (of the queries coming from multiple mappings) is done before the joins joins are performed on Skolem terms generates doubly nested queries Which method is better, depends on various parameters

35/35 Summary 1 OBDA Options 2 Some technical details Introduction The ontology language The mapping layer Impedance mismatch Mapping assertions Query answering