Lecture 9: Datalog with Negation

Similar documents
Lecture 1: Conjunctive Queries

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

CMPS 277 Principles of Database Systems. Lecture #11

Data Integration: Logic Query Languages

Database Theory: Beyond FO

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

Logical Aspects of Massively Parallel and Distributed Systems

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Datalog. Rules Programs Negation

Local Stratiæcation. Instantiate rules; i.e., substitute all. possible constants for variables, but reject. instantiations that cause some EDB subgoal

Constructive negation with the well-founded semantics for CLP

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

CSE 344 JANUARY 29 TH DATALOG

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Range Restriction for General Formulas

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

Data Integration: Datalog

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

Semantic Forcing in Disjunctive Logic Programs

Constraint Solving. Systems and Internet Infrastructure Security

Aggregates in Recursion: Issues and Solutions

Choice Logic Programs and Nash Equilibria in Strategic Games

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

A Retrospective on Datalog 1.0

Datalog Recursive SQL LogicBlox

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Forcing in disjunctive logic programs

Complexity of Answering Queries Using Materialized Views

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Chapter 5: Other Relational Languages

6. Lecture notes on matroid intersection

Computability Theory

Deductive Databases. Motivation. Datalog. Chapter 25

Program Analysis in Datalog

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

A Simple SQL Injection Pattern

arxiv: v1 [cs.db] 23 May 2016

1 Variations of the Traveling Salesman Problem

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

CSE 544: Principles of Database Systems

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

A reasoning model based on the production of acceptable. arguments

DATALOG AS A PARALLEL GENERAL PURPOSE PROGRAMMING LANGUAGE

CSE-6490B Final Exam

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Chapter 3. Set Theory. 3.1 What is a Set?

CS40-S13: Functional Completeness

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Foundations of Databases

Knowledge Representation. CS 486/686: Introduction to Artificial Intelligence

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Exercises Computational Complexity

Complexity Classes and Polynomial-time Reductions

Principles of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic

Theorem 2.9: nearest addition algorithm

Finding a winning strategy in variations of Kayles

Homework 1. Due Date: Wednesday 11/26/07 - at the beginning of the lecture

Answer Set Programming

Computer Science Technical Report

CSCI-6421 Final Exam York University Fall Term 2004

Outline. Implementing a basic general game player. Foundations of logic programming. Metagaming: rule optimisation. Logic Programming.

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

Safe Stratified Datalog With Integer Order Does not Have Syntax

arxiv: v1 [cs.db] 10 Dec 2018

Negation wrapped inside a recursion makes no. separated, there can be ambiguity about what. the rules mean, and some one meaning must

Lecture 22 Acyclic Joins and Worst Case Join Results Instructor: Sudeepa Roy

Shannon Switching Game

Chapter 3: Propositional Languages

CS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

SQL, DLs, Datalog, and ASP: comparison

Binary Decision Diagrams

CS 512, Spring 2017: Take-Home End-of-Term Examination

Lecture 4: January 12, 2015

Intermediate Math Circles Wednesday, February 22, 2017 Graph Theory III

Computing Full Disjunctions

A SHARKOVSKY THEOREM FOR VERTEX MAPS ON TREES

PROBABILISTIC GRAPHICAL MODELS SPECIFIED BY PROBABILISTIC LOGIC PROGRAMS: SEMANTICS AND COMPLEXITY

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs.

5. Lecture notes on matroid intersection

Datalog. S. Costantini

CSE 20 DISCRETE MATH. Fall

Finite Model Theory and Its Applications

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Towards a Logical Reconstruction of Relational Database Theory

Knowledge-Based Systems and Deductive Databases

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

Condition Using Constraints. Stratication conditions for logic programs aim to ensure a two-valued semantics

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL:

Finding Strongly Connected Components

SAT-CNF Is N P-complete

Web Science & Technologies University of Koblenz Landau, Germany. Stratified Programs

Translation of Aggregate Programs to Normal Logic Programs

Lecture 11: Feb. 10, 2016

Transcription:

CS 784: Foundations of Data Management Spring 2017 Instructor: Paris Koutris Lecture 9: Datalog with Negation In this lecture we will study the addition of negation to Datalog. We start with an example. Example 9.1. Suppose that we want to compute the complement of graph transitive closure (i.e. output all edges (a, b) such that there is no directed path from a to b). We can compute this using the following Datalog program: V(x) :- R(x,y). V(y) :- R(x,y). T(x,y) :- R(x,y). T(x,y) :- T(x,z), R(z,y). TC(x,y):- V(x), V(y), not T(x,y). Notice that we now allow negation in the body of the rules. Similar to the case of Conjunctive Queries with negation, we have to be careful that the negation is safe: this means that every variable that appears in a negated atom must also appear in a positive atom. We also do not allow negation to appear in the head of a rule. We call the new language Datalog. The semantics of (positive) Datalog are intuitive and well-defined; there exists a unique minimum model, and we can apply the fixpoint algorithm to iteratively find this model. However, when we add negation, finding the right semantics becomes a harder task. We investigate this next. The first attempt to define the semantics would be by extending the fixpoint semantics. The immediate consequence operator can still be defined as with positive Datalog (it will be an RA expression with negation). However, the fixpoint is not well-defined in the case of Datalog with negation! Consider the following examples: Example 9.2. Consider the following program in Datalog : R(x) : S(x), R(x), and suppose that the EDB S has input {S(1)}. In this case, neither {R(1)} nor { R(1)} are fixpoints! In fact, the above program does not have any fixpoint if we follow the standard definition. Example 9.3. Consider the following program in Datalog : R(x) : S(x), T(x) and T(x) : S(x), R(x), with input {S(1)}. In this case, we have two fixpoints: {R(1), T(1)} and { R(1), T(1)}. None of the two fixpoints is contained in the other, so they are both minimal fixpoints! So how can we define the semantics of Datalog with negation? We will see next several different ways that we can define the semantics for Datalog. 9-1

Lecture 9: Datalog with Negation 9-2 9.1 Semi-positive Datalog In semi-positive Datalog, we are allowed to have negation only in front of EDB relations. To define the semantics for semi-positive Datalog, we interpret R( x) as follows: R( x) is true if and only if x is in the active domain and x is not in R. The standard iterative computation works for semipositive Datalog, since the negated EDB relations can essentially be replaced by a positive relation (which is the complement). Example 9.4. Consider the following Datalog program: RC(x,y) :- V(x), V(y), not R(x,y). T(x,y) :- RC(x,y). T(x,y) :- T(x,z), RC(z,y). where the EDB R is: {R(1, 2), R(2, 3), R(1, 1), R(1, 3)}. We can now compute the complement of R as: {R(2, 1), R(2, 2), R(3, 3), R(3, 2), R(3, 1)}. We can now replace R with R in the Datalog program with the complementary input, and run it simply as a positive Datalog program. 9.2 Stratified Datalog A stratified Datalog program with negation is a generalization of the idea behind semi-positive Datalog. In semi-positive Datalog, the negation is only in front of EDB relations. We can now try to add negation in front of an IDB predicate that is the output of a semi-positive program (when there is no mutual recursion), and so on. Definition 9.5 (Stratification). A stratification of a Datalog program P is a partition of P into Datalog sub-programs P 1,..., P m such that: All the rules defining the same IDB relation are in the same partition. If an IDB R appears positive in the body of a rule with head R, R should be defined in a partition before or where R is defined. If an IDB R appears negated in the body of a rule with head R, R should be defined in partition strictly before where R is defined. Example 9.6. Consider the Datalog program from Example 9.1. This program can be stratified in 3 Datalog subprograms as follows: (P1) V(x) :- R(x,y). (P1) V(y) :- R(x,y). (P2) T(x,y) :- R(x,y). (P2) T(x,y) :- T(x,z), R(z,y). (P3) TC(x,y):- V(x), V(y), not T(x,y).

Lecture 9: Datalog with Negation 9-3 Notice that (P2) defines the IDB R, and (P3) defines TC; TC has a negated IDB in the body of a rule that defines it, but this IDB appears in the previous stratum (P2). A Datalog program can have many possible stratifications. For instance, in the above example we could switch the order of (P1), (P2) and still get a stratification. Also, not every Datalog program can be stratified! For instance, the program R(x) : S(x), R(x) is not stratifiable! If a Datalog program can the stratified, we can use the stratification to provide well-defined semantics. The idea is simple: we start from the first stratum (partition), which will be a semipositive program, and compute the fixpoint of this stratum. We can now "freeze" the IDB predicates defined in the first stratum, and use them as EDB relations in the second stratum. We continue this process until we traverse all strata. How do we determine if a Datalog program is stratifiable? We construct the precedence graph: whenever an IDB predicate R appears positive in a rule with head S, we add a directed edge (R, S) with label +. Whenever an IDB predicate R appears negative in a rule with head S, we add an edge (R, S) with label. We can now use the precedence graph to look for a stratification Lemma 9.7. A Datalog program is stratifiable if and only if the precedence graph has no directed cycles with a negative edge. If the precedence graph indeed has no directed cycles with a negative edge, any topological ordering of the graph will give a stratification of the program. What happens in the case a Datalog program admits many possible stratifications? Theorem 9.8. Let P be a stratifiable program in Datalog. Then: The result of the stratified semantics is the same, independent of which stratification of P we choose. If P is positive, the stratified semantics output the unique minimum model. 9.3 Well-Founded Semantics As we discussed before, there are Datalog programs that cannot be stratified. We now discuss how to define semantics for these types of programs, which we call well-founded semantics. As a running example, we will consider the Datalog program that describes the win-move game. In this game, we have a directed graph given by the relation moves(x, y). Two players make alternatively moves in the graph according to the relation moves. A player loses if she has no moves to make. We want to compute the predicate win(x), which means that the player has a winning strategy if she moves from vertex x. The following Datalog program defines the relation: win(x) :- moves(x,y), not win(y).

Lecture 9: Datalog with Negation 9-4 We will consider the following instance of the relation moves: {moves(1, 2), moves(2, 1), moves(1, 3), moves(3, 4)} We start by defining the stable model of a Datalog program. Intuitively, a set I of IDB facts is a stable model if, by applying the rules of P we get exactly I; however, we can only use non-membership of a fact to I when we apply the rules. To formally define a stable model, we need to define the reduct of a program w.r.t. a set of facts I. Definition 9.9 (Reduct). The reduct P of a Datalog program P w.r.t. a set of facts I is obtained as follows: 1. Create all possible instantiations (groundings) of the rules in P. 2. Delete all instantiations that contain R( a), where R( a) I. 3. Delete all remaining negative literals in the bodies of the remaining rules. The reduct of a Datalog program w.r.t. I is always a positive Datalog program. Thus, we can always define the fixpoint of the reduct. Definition 9.10 (Stable Model). We say that I is a stable model of a Datalog program P if the fixpoint of its reduct w.r.t. to I is exactly I. Example 9.11. Let I = {win(3), win(1)}. To compute the reduct, we first write the instantiations of the rules: win(1) :- moves(1,2), not win(2). win(2) :- moves(2,1), not win(1). win(1) :- moves(1,3), not win(3). win(3) :- moves(3,4), not win(4). Since win(1), win(3) are in I, we remove rules 2 and 3; and we remove the negated literals from the remaining rules. The result is the reduct: win(1) :- moves(1,2). win(3) :- moves(3,4). If we run the fixpoint for this program, we obtain that the output is {win(1), win(3)}, which is exactly I. Thus, I = {win(3), win(1)} is a stable model! A Datalog program can have several (exponentially many) stable models. For our running example, there are two stable models: {win(3), win(1)}, {win(3), win(2)}

Lecture 9: Datalog with Negation 9-5 It can also be the case that there are no stable models! To overcome these problem, the wellfounded semantics classify each possible answer to facts that are true, false, and facts that are unknown. In other words, we want to find a 3-valued model to describe the answer to the program. Formally, each possible fact in the active domain will be mapped to one of three values: 0, 1, 1/2 (0 means certainly not in the output, 1 means certainly in the output, and 1/2 means unknown/uncertain). Definition 9.12. Given a Datalog program P, the well-founded model of P is the minimal 3-valued model such that: (i) the true facts belong to all stable models of P, and (ii) the false facts belong in no stable model of P. In other words, if a fact appears in some stable models (and not all), then we classify it as unknown. For our running example, the well-founded model has one true fact (win(3)), and one false fact (win(4)); the remaining facts (win(1), win(2)) are unknown (intuitively, they are the ones that lead to a draw). To compute the well-founded model, we start by assuming that the facts are all false, hence I 0 =. In the first iteration, we compute the reduct of P w.r.t to I 0, and compute the fixpoint, which will be I 1. In the second iteration, we compute the reduct of P w.r.t. to I 1 and then find the fixpoint I 2, and so on. We call this procedure alternating fixpoint. The above process does not have a fixpoint, but it has two fixpoints! The results of even iterations I 2k are monotonically increasing, and the results of the odd iterations I 2k+1 are monotonically decreasing. What happens is that I 2k is an underestimation of the true facts, and an overestimation of the false facts; I 2k+1 is an overestimation of the false facts and an underestimation of the true facts. The fixpoint of the even iterations will give us the set of all true facts, and the fixpoint of the odd iterations the set of all false facts. The remaining facts will be labeled as unknown! Example 9.13. Consider our running example. Then we have: I 0 = I 1 = {win(1), win(2), win(3)} I 2 = {win(3)} I 3 = {win(1), win(2), win(3)} I 4 = {win(3)} At this point we have reached the fixpoint. The true fact is win(3), the false fact is win(4), and the remaining are unknown. References [Alice] S. ABITEBOUL, R. HULL and V. VIANU, Foundations of Databases.