Data Integration: Logic Query Languages

Similar documents
Data Integration: Datalog

Constraint Solving. Systems and Internet Infrastructure Security

THE RELATIONAL MODEL. University of Waterloo

Lecture 9: Datalog with Negation

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Relational Databases

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

CSC Discrete Math I, Spring Sets

Chapter 5: Other Relational Languages

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

LOGIC AND DISCRETE MATHEMATICS

Knowledge Representation and Reasoning Logics for Artificial Intelligence

Deductive Databases. Motivation. Datalog. Chapter 25

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Aggregates in Recursion: Issues and Solutions

Lecture 1: Conjunctive Queries

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

SQL, DLs, Datalog, and ASP: comparison

Introductory logic and sets for Computer scientists

Extending Prolog with Incomplete Fuzzy Information

Negation wrapped inside a recursion makes no. separated, there can be ambiguity about what. the rules mean, and some one meaning must

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

Logic (or Declarative) Programming Foundations: Prolog. Overview [1]

An Evolution of Mathematical Tools

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Recap Datalog Datalog Syntax Datalog Semantics. Logic: Datalog. CPSC 322 Logic 6. Textbook Logic: Datalog CPSC 322 Logic 6, Slide 1

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Temporal Data Model for Program Debugging

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Towards a Logical Reconstruction of Relational Database Theory

Lecture 4: January 12, 2015

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Incomplete Information: Null Values

Other Query Languages. Winter Lecture 11

A review of First-Order Logic

Web Science & Technologies University of Koblenz Landau, Germany. Stratified Programs

Database Theory: Beyond FO

HANDBOOK OF LOGIC IN ARTIFICIAL INTELLIGENCE AND LOGIC PROGRAMMING

Part I Logic programming paradigm

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

DATABASE DESIGN II - 1DL400

Introduction to Data Management CSE 344. Lecture 14: Relational Calculus

Chapter 16. Logic Programming. Topics. Unification. Resolution. Prolog s Search Strategy. Prolog s Search Strategy

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

CSE 344 APRIL 11 TH DATALOG

LP attracted two kinds of enthousiasts engineers/programmers \Never had I experienced such ease in getting a complex program coded and running" D.H.D.

Denotational Semantics. Domain Theory

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

Datalog. Rules Programs Negation

CS558 Programming Languages

CSE 344 JANUARY 29 TH DATALOG

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Logic and its Applications

CSC 501 Semantics of Programming Languages

Knowledge Representation and Reasoning Logics for Artificial Intelligence

Query Processing SL03

Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Constructive negation with the well-founded semantics for CLP

A Simple SQL Injection Pattern

DISCRETE MATHEMATICS

Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

Complexity of Answering Queries Using Materialized Views

Range Restriction for General Formulas

Local Stratiæcation. Instantiate rules; i.e., substitute all. possible constants for variables, but reject. instantiations that cause some EDB subgoal

(More) Propositional Logic and an Intro to Predicate Logic. CSCI 3202, Fall 2010

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

CMPS 277 Principles of Database Systems. Lecture #11

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

The Relational Model

Software Engineering Lecture Notes

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

Operations of Relational Algebra

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

Safe Stratified Datalog With Integer Order Does not Have Syntax

Query Decomposition and Data Localization

COMP718: Ontologies and Knowledge Bases

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

CS Bootcamp Boolean Logic Autumn 2015 A B A B T T T T F F F T F F F F T T T T F T F T T F F F

The Relational Algebra

Chapter 16. Logic Programming Languages

The three faces of homotopy type theory. Type theory and category theory. Minicourse plan. Typing judgments. Michael Shulman.

Chapter 10 Part 1: Reduction

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter p. 1/27

Relative Information Completeness

Prolog. Logic Programming vs Prolog

Markov Logic: Representation

XSB. Tim Finin University of Maryland Baltimore County. What XSB offers Tabling Higher order logic programming. Negation. sort of.

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Transcription:

Data Integration: Logic Query Languages Jan Chomicki University at Buffalo

Datalog

Datalog A logic language Datalog programs consist of logical facts and rules Datalog is a subset of Prolog (no data structures)

Datalog A logic language Datalog programs consist of logical facts and rules Datalog is a subset of Prolog (no data structures) Basic concepts term: constant, variable predicate (relation) atom clause, rule, fact groundness: no variables substitution: mapping from variables to terms ground substitution: mapping from variables to constants

Logic programs

Atom Logic programs syntax: P(T 1,..., T n) semantics: predicate P is true of terms T 1 and... and T n variables present: truth under a ground substitution

Atom Logic programs syntax: P(T 1,..., T n) semantics: predicate P is true of terms T 1 and... and T n variables present: truth under a ground substitution Implication (clause) syntax: A 0 : A 1,..., A k semantics: atom A 0 is true if atoms A 1 and and A k are true all the variables universally quantified: implication true under all ground substitutions all the variables in the head occur also in the body some predicates in the body may be built-in: >,,...

Atom Logic programs syntax: P(T 1,..., T n) semantics: predicate P is true of terms T 1 and... and T n variables present: truth under a ground substitution Implication (clause) syntax: A 0 : A 1,..., A k semantics: atom A 0 is true if atoms A 1 and and A k are true all the variables universally quantified: implication true under all ground substitutions all the variables in the head occur also in the body some predicates in the body may be built-in: >,,... Kinds of clauses k = 0: fact k > 0: rule consisting of head A 0 and body A 1,..., A k

Logic query languages

Logic query languages Datalog program P EDB(P): a set of true ground facts encoding a database instance IDB(P): a set of rules encoding a query, with a special predicate query to return the result

Logic query languages Datalog program P EDB(P): a set of true ground facts encoding a database instance IDB(P): a set of rules encoding a query, with a special predicate query to return the result Predicate dependence direct: the predicate in the head depends on each predicate in the body indirect: through multiple rules recursion: predicate depends on itself

Logic query languages Datalog program P EDB(P): a set of true ground facts encoding a database instance IDB(P): a set of rules encoding a query, with a special predicate query to return the result Predicate dependence direct: the predicate in the head depends on each predicate in the body indirect: through multiple rules recursion: predicate depends on itself Logic query languages conjunctive queries: single rule, no recursion unions of conjunctive queries: multiple rules, no recursion Datalog: multiple rules, recursion

Example logic program Friends %% Facts friend(joe,sue). friend(ann,sue). friend(sue,max). friend(max,ann). joe sue %% Rules fof(x,y) :- friend(x,y). fof(x,z) :- friend(x,y), fof(y,z). %% Query 1 query(x) :- fof(x,ann). %% Query 2 query(x) :- fof(x,y), fof(y,x). max ann

Logical semantics

Logical semantics Deriving facts of IDB(P) predicates bottom-up 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived

Logical semantics Deriving facts of IDB(P) predicates bottom-up 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived Derivation properties the derivation terminates: why? soundness: the derived true facts are logical consequences of P completeness: all the logical consequences of P are derived

Logical semantics Deriving facts of IDB(P) predicates bottom-up Derived facts: all friend facts and 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived Derivation properties the derivation terminates: why? soundness: the derived true facts are logical consequences of P completeness: all the logical consequences of P are derived

Logical semantics Deriving facts of IDB(P) predicates bottom-up 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived Derived facts: all friend facts and %% Direct friends fof(joe,sue). fof(ann,sue). fof(sue,max). fof(max,ann). Derivation properties the derivation terminates: why? soundness: the derived true facts are logical consequences of P completeness: all the logical consequences of P are derived

Logical semantics Deriving facts of IDB(P) predicates bottom-up 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived Derivation properties the derivation terminates: why? soundness: the derived true facts are logical consequences of P completeness: all the logical consequences of P are derived Derived facts: all friend facts and %% Direct friends fof(joe,sue). fof(ann,sue). fof(sue,max). fof(max,ann). %% Second-level friends fof(joe,max). fof(sue,ann). fof(ann,max). fof(max,sue).

Logical semantics Deriving facts of IDB(P) predicates bottom-up 1 EDB(P) facts are true 2 single step: for all the ground substitutions that map the body of a rule in IDB(P) to true facts, make the (substituted) head true 3 repeat until no new true facts are derived Derivation properties the derivation terminates: why? soundness: the derived true facts are logical consequences of P completeness: all the logical consequences of P are derived Derived facts: all friend facts and %% Direct friends fof(joe,sue). fof(ann,sue). fof(sue,max). fof(max,ann). %% Second-level friends fof(joe,max). fof(sue,ann). fof(ann,max). fof(max,sue). %% Third-level friends fof(joe,ann). fof(sue,sue). for(ann,ann). for(max,max).

Open vs. Closed World Assumption

Open vs. Closed World Assumption Closed World Assumption (CWA) What is not implied by a logic program is false.

Open vs. Closed World Assumption Closed World Assumption (CWA) What is not implied by a logic program is false. Open World Assumption (OWA) What is not implied by a logic program is unknown.

Open vs. Closed World Assumption Closed World Assumption (CWA) What is not implied by a logic program is false. Open World Assumption (OWA) What is not implied by a logic program is unknown. Scope traditional database applications: CWA information integration: OWA or CWA

Open vs. Closed World Assumption Closed World Assumption (CWA) What is not implied by a logic program is false. Open World Assumption (OWA) What is not implied by a logic program is unknown. Scope traditional database applications: CWA information integration: OWA or CWA Can negation be allowed inside Datalog rules?

Datalog not Syntax Rules with negative atoms in the body: A 0 : A 1,..., A k, not B 1,..., not B m.

Datalog not Syntax Rules with negative atoms in the body: A 0 : A 1,..., A k, not B 1,..., not B m. Example asymmetric(x,y):- fof(x,y), not fof(y,x).

Datalog not Syntax Rules with negative atoms in the body: A 0 : A 1,..., A k, not B 1,..., not B m. Example asymmetric(x,y):- fof(x,y), not fof(y,x). Semantics Cannot be adequately given in terms of implication.

Coding relational algebra

Coding relational algebra Coding operators selection, Cartesian product: single clause projection: single clause with projected out attributes only in the body union: multiple clauses set difference: negation

Coding relational algebra Coding operators selection, Cartesian product: single clause projection: single clause with projected out attributes only in the body union: multiple clauses set difference: negation But what about recursion?

Stratification

Dependency graph pdg(p) Stratification vertices: predicates of a Datalog not program P edges: a positive edge (p, q) if there is a clause in P in which q appears in a positive atom in the body and p appears in the head a negative edge (p, q) if there is a clause in P in which q appears in a negative atom in the body and p appears in the head

Dependency graph pdg(p) Stratification vertices: predicates of a Datalog not program P edges: a positive edge (p, q) if there is a clause in P in which q appears in a positive atom in the body and p appears in the head a negative edge (p, q) if there is a clause in P in which q appears in a negative atom in the body and p appears in the head Stratified P No cycle in pdg(p) contains a negative edge.

Dependency graph pdg(p) Stratification vertices: predicates of a Datalog not program P edges: a positive edge (p, q) if there is a clause in P in which q appears in a positive atom in the body and p appears in the head a negative edge (p, q) if there is a clause in P in which q appears in a negative atom in the body and p appears in the head Stratified P No cycle in pdg(p) contains a negative edge. Stratification Mapping s from the set of predicates in P to nonnegative integers such that: 1 if a positive edge (p, q) is in pdg(p), then s(p) s(q) 2 if a negative edge (p, q) is in pdg(p), then s(p) > s(q)

Dependency graph pdg(p) Stratification vertices: predicates of a Datalog not program P edges: a positive edge (p, q) if there is a clause in P in which q appears in a positive atom in the body and p appears in the head a negative edge (p, q) if there is a clause in P in which q appears in a negative atom in the body and p appears in the head Stratified P No cycle in pdg(p) contains a negative edge. Stratification Mapping s from the set of predicates in P to nonnegative integers such that: 1 if a positive edge (p, q) is in pdg(p), then s(p) s(q) 2 if a negative edge (p, q) is in pdg(p), then s(p) > s(q) There is a polynomial-time algorithm to determine whether a program is stratified, and if it is, to find a stratification for it.

Stratified Datalog not : query evaluation

Stratified Datalog not : query evaluation Bottom-up evaluation 1 compute a stratification of a program P 2 partition P into P 1,..., P n such that each P i consisting of all and only rules whose head belongs to a single stratum P 1 is the lowest stratum 3 evaluate bottom-up P 1,..., P n (in that order).

Stratified Datalog not : query evaluation Bottom-up evaluation 1 compute a stratification of a program P 2 partition P into P 1,..., P n such that each P i consisting of all and only rules whose head belongs to a single stratum P 1 is the lowest stratum 3 evaluate bottom-up P 1,..., P n (in that order). Result does not depend on the stratification can be semantically characterized in various ways is used to compute query results

Universal quantification

Coding universal quantification through double negation. Universal quantification

Universal quantification Coding universal quantification through double negation. Example everybodysfriend(x):- person(x), not isnotfriend(x). isnotfriend(x):- person(x), person(y), not friend(y,x).

Universal quantification Coding universal quantification through double negation. Example everybodysfriend(x):- person(x), not isnotfriend(x). isnotfriend(x):- person(x), person(y), not friend(y,x).

Query result The query result Q(D) is defined for every input database D. Expressiveness

Query result The query result Q(D) is defined for every input database D. Expressiveness Query containment Q 1 Q 2 if and only if Q 1(D) Q 2(D) for every input database D.

Query result The query result Q(D) is defined for every input database D. Expressiveness Query containment Q 1 Q 2 if and only if Q 1(D) Q 2(D) for every input database D. Query equivalence Q 1 Q 2 if and only if Q 1 Q 2 and Q 2 Q 1.

Query result The query result Q(D) is defined for every input database D. Expressiveness Query containment Q 1 Q 2 if and only if Q 1(D) Q 2(D) for every input database D. Query equivalence Q 1 Q 2 if and only if Q 1 Q 2 and Q 2 Q 1. Query language containment L 1 L 2 if for every query Q 1 L 1, there is an equivalent query Q 2 in L 2. Q 2 may be in a different syntax than Q 1

Comparing query languages Expressiveness Datalog Stratified Datalog not Relational Algebra Stratified Datalog not Datalog Relational Algebra transitive closure Relational Algebra Datalog set difference

Comparing query languages Expressiveness Datalog Stratified Datalog not Relational Algebra Stratified Datalog not Datalog Relational Algebra transitive closure Relational Algebra Datalog set difference How to prove expressiveness results? considering the syntax not enough semantic propreties

Monotonicity Monotonicity A query language L is monotonic if for every query Q L, adding facts to the database cannot remove any tuples from the result of Q. Formally: for all databases D 1 and D 2 D 1 D 2 implies Q(D 1) Q(D 2).

Monotonicity Monotonicity A query language L is monotonic if for every query Q L, adding facts to the database cannot remove any tuples from the result of Q. Formally: for all databases D 1 and D 2 D 1 D 2 implies Q(D 1) Q(D 2). Query languages monotonic: Datalog nonmonotonic: Datalog not, relational algebra, SQL