Datalog. Rules Programs Negation

Similar documents
Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Database Theory: Beyond FO

Lecture 9: Datalog with Negation

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Range Restriction for General Formulas

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

Data Integration: Logic Query Languages

I Relational Database Modeling how to define

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Relations and Graphs

Chapter 5: Other Relational Languages

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra.

Local Stratiæcation. Instantiate rules; i.e., substitute all. possible constants for variables, but reject. instantiations that cause some EDB subgoal

Negation wrapped inside a recursion makes no. separated, there can be ambiguity about what. the rules mean, and some one meaning must

Datalog Recursive SQL LogicBlox

CSE 344 JANUARY 26 TH DATALOG

I Relational Database Modeling how to define

A Retrospective on Datalog 1.0

Datalog. S. Costantini

Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Relational Database: The Relational Data Model; Operations on Database Relations

Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

CSE 344 JANUARY 29 TH DATALOG

Program Analysis in Datalog

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

A set with only one member is called a SINGLETON. A set with no members is called the EMPTY SET or 2 N

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

A Simple SQL Injection Pattern

Lecture 6: Arithmetic and Threshold Circuits

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

Foundations of Computer Science Spring Mathematical Preliminaries

Lecture 1: Conjunctive Queries

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Relational Databases

Safe Stratified Datalog With Integer Order Does not Have Syntax

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Database Theory: Datalog, Views

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

Functional programming with Common Lisp

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

A Formal Semantics for a Rule-Based Language

CMPS 277 Principles of Database Systems. Lecture #11

Chapter 24. Active Database Concepts and Triggers. Outline. Trigger Example. Event-Condition-Action (ECA) Model

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

CSC Discrete Math I, Spring Sets

COMPUTABILITY THEORY AND RECURSIVELY ENUMERABLE SETS

Data Integration: Datalog

Introduction to Data Management CSE 344. Lecture 14: Datalog

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

Part I Logic programming paradigm

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

Discrete Mathematics Lecture 4. Harper Langston New York University

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Model Finder. Lawrence Chung 1

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Scheme: Data. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, Glenn G.

Recursion and Induction: Haskell; Primitive Data Types; Writing Function Definitions

Scheme Quick Reference

Chapter 3: Relational Model

CSC 501 Semantics of Programming Languages

Lecture Notes on Real-world SMT

Complete Instantiation of Quantified Formulas in Satisfiability Modulo Theories. ACSys Seminar

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

CSE 20 DISCRETE MATH WINTER

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

Formal Methods for Software Development

Comparing sizes of sets

Deductive Databases. Motivation. Datalog. Chapter 25

Deductive Approach for Programming Sensor Networks

Three Query Language Formalisms

CSE Qualifying Exam, Spring February 2, 2008

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

Constraint Propagation for Efficient Inference in Markov Logic

Scheme Quick Reference

Typed Racket: Racket with Static Types

µz An Efficient Engine for Fixed Points with Constraints

Introduction to Functions

Deductive Framework for Programming Sensor Networks

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

Lectures 20, 21: Axiomatic Semantics

Foundations of Databases

Advances in Data Management Beyond records and objects A.Poulovassilis

Transcription:

Datalog Rules Programs Negation 1

Review of Logical If-Then Rules body h(x, ) :- a(y, ) & b(z, ) & head subgoals The head is true if all the subgoals are true. 2

Terminology Head and subgoals are atoms. An atom consists of a predicate (lower case) applied to zero or more arguments (upper case letters or constants). 3

Semantics Predicates represent relations. An atom is true for given values of its variables iff the arguments form a tuple of the relation. Whenever an assignment of values to all variables makes all subgoals true, the rule asserts that the resulting head is also true. 4

Example We shall develop rules that describe what is necessary to make a file. The predicates/relations: source(f) = F is a source file. includes(f,g) = F #includes G. create(f,p,g) = F is created by applying process P to file G. 5

Example --- Continued Rules to define view req(x,y) = file Y is required to create file X : req(f,f) :- source(f) req(f,g) :- includes(f,g) req(f,g) :- create(f,p,g) req(f,g) :- req(f,h) & req(h,g) G is required for F if there is some process P that creates F from G. G is required for F if there is some H such that H is required for F and G is required for H. 6

Why Not Just Use SQL? 1. Recursion is much easier to express in Datalog. Viz. last rule for req. 2. Rules express things that go on in both FROM and WHERE clauses, and let us state some general principles (e.g., containment of rules) that are almost impossible to state correctly in SQL. 7

IDB/EDB A predicate representing a stored relation is called EDB (extensional database). A predicate representing a view, i.e., a defined relation that does not exist in the database is called IDB (intesional database). Head is always IDB; subgoals may be IDB or EDB. 8

Datalog Programs A collection of rules is a (Datalog) program. Each program has a distinguished IDB predicate that represents the result of the program. E.g., req in our example. 9

Extensions 1. Negated subgoals. 2. Constants as arguments. 3. Arithmetic subgoals. 10

Negated Subgoals NOT in front of a subgoal means that an assignment of values to variables must make it false in order for the body to be true. Example: cycle(f) :- req(f,f) & NOT source(f) 11

Constants as Arguments We use numbers, lower-case letters, or quoted strings to indicate a constant. Example: req( foo.c, stdio.h ) :- Note that empty body is OK. Mixed constants and variables also OK. 12

Arithmetic Subgoals Comparisons like < may be thought of as infinite, binary relations. Here, the set of all tuples (x,y) such that x<y. Use infix notation for these predicates. Example: composite(a) :- divides(b,a) & B > 1 & B!= A 13

Evaluating Datalog Programs 1. Nonrecursive programs. 2. Naïve evaluation of recursive programs without IDB negation. 3. Seminaïve evaluation of recursive programs without IDB negation. Eliminates some redundant computation. 14

Safety When we apply a rule to finite relations, we need to get a finite result. Simple guarantee: safety = all variables appear in some nonnegated, relational (not arithmetic) subgoal of the body. Start with the join of the nonnegated, relational subgoals and select/delete from there. 15

Examples: Nonsafety p(x) :- q(y) X is the problem Both X and Y are problems bachelor(x) :- NOT married(x,y) bachelor(x) :- person(x) & NOT married(x,y) Y is still a problem 16

Nonrecursive Evaluation If (and only if!) a Datalog program is not recursive, then we can order the IDB predicates so that in any rule for p (i.e., p is the head predicate), the only IDB predicates in the body precede p. 17

Why? Consider the dependency graph with: Nodes = IDB predicates. Arc p -> q iff there is a rule for p with q in the body. Cycle involving node p means p is recursive. No cycles: use topological order to evaluate predicates. 18

Applying Rules To evaluate an IDB predicate p : 1. Apply each rule for p to the current relations corresponding to its subgoals. Apply = If an assignment of values to variables makes the body true, insert the tuple that the head becomes into the relation for p (no duplicates). 2. Take the union of the result for each p- rule. 19

Example p(x,y) :- q(x,z) & r(z,y) & Y<10 Q = {(1,2), (3,4)}; R = {(2,5), (4,9), (4,10), (6,7)} Assignments making the body true: (X,Y,Z) = (1,5,2), (3,9,4) So P = {(1,5), (3,9)}. 20

Algorithm for Nonrecursive FOR (each predicate p in topological order) DO apply the rules for p to previously computed relations to compute relation P for p; 21

Naïve Evaluation for Recursive make all IDB relations empty; WHILE (changes to IDB) DO FOR (each IDB predicate p) DO evaluate p using current values of all relations; 22

Important Points As long as there is no negation of IDB subgoals, then each IDB relation grows, i.e., on each round it contains at least what it used to contain. Since relations are finite, the loop must eventually terminate. Result is the least fixedpoint (minimal model ) of rules. 23

Seminaïve Evaluation Key idea: to get a new tuple for relation P on one round, the evaluation must use some tuple for some relation of the body that was obtained on the previous round. Maintain P = new tuples added to P on previous round. Differentiate rule bodies to be union of bodies with one IDB subgoal made. 24

Example ( make files ) r(f,f) :- s(f) r(f,g) :- i(f,g)) r(f,g) :- c(f,p,g) r(f,g) :- r(f,h) & r(h,f) Assume EDB predicates s, i, c have relations S, I, C. 25

Example --- Continued Initialize: R = R = σ #1=#2 (S S) I π 1,3 (C) Repeat until R = φ: 1. R = π 1,3 (R R R R) 2. R = R - R 3. R = R R 26

Function Symbols in Rules Extends Datalog by allowing arguments built from constants, variables, and function names, recursively applied. Function names look like predicate names, but are allowed only within the arguments of atoms. Predicates return true/false; functions return arbitrary values. 27

Example Instead of a string argument like 101 Maple we could use a term like addr(street( Maple ), number(101)) Compare with the XML term <ADDR><STREET>Maple</STREET> <NUMBER>101</NUMBER> </ADDR> 28

Another Example Predicates: 1. istree(x) = X is a binary tree. 2. label(l) = L is a node label. Functions: 1. node(a,l,r) = a tree with root labeled A, left subtree L, and right subtree R. 2. null = 0-ary function (constant) representing the empty tree. 29

Example --- Continued The rules: istree(null) :- istree(node(l,t1,t2)) :- label(l) & istree(t1) & istree(t2) 30

Example --- Concluded Assume label(a) and label(b) are true. I.e., a and b are in the relation for label. Infer istree(node(a,null,null)). Infer istree(node(b,null,node(a,null,null))). b a 31

Evaluation of Rules With Function Symbols Naïve, seminaïve still work when there are no negated IDB subgoals. They both lead to the unique least fixedpoint (minimal model). But this fixedpoint may not be reached in any finite number of rounds. The istree rules are an example. 32

Problems With IDB Negation When rules have negated IDB subgoals, there can be several minimal models. Recall: model = set of IDB facts, plus the given EDB facts, that make the rules true for every assignment of values to variables. Rule is true unless body is true and head is false. 33

Example: EDB red(x,y)= the Red bus line runs from X to Y. green(x,y)= the Green bus line runs from X to Y. 34

Example: IDB greenpath(x,y)= you can get from X to Y using only Green buses. monopoly(x,y)= Red has a bus from X to Y, but you can t get there on Green, even changing buses. 35

Example: Rules greenpath(x,y) :- green(x,y) greenpath(x,y) :- greenpath(x,z) & greenpath(z,y) monopoly(x,y) :- red(x,y) & NOT greenpath(x,y) 36

EDB Data red(1,2), red(2,3), green(1,2) 1 2 3 37

Two Minimal Models 1. EDB + greenpath(1,2) + monopoly(2,3) 2. EDB + greenpath(1,2) + greenpath(2,3) + greenpath(1,3) greenpath(x,y) :- green(x,y) greenpath(x,y) :- greenpath(x,z) & greenpath(z,y) monopoly(x,y) :- red(x,y) & NOT greenpath(x,y) 1 2 3 38

Stratified Models 1. Dependency graph describes how IDB predicates depend negatively on each other. 2. Stratified Datalog = no recursion involving negation. 3. Stratified model is a particular model that makes sense for stratified Datalog programs. 39

Dependency Graph Nodes = IDB predicates. Arc p -> q iff there is a rule for p that has a subgoal with predicate q. Arc p -> q labeled iff there is a subgoal with predicate q that is negated. 40

Monopoly Example monopoly -- greenpath 41

Another Example: Win win(x) :- move(x,y) & NOT win(y) Represents games like Nim where you win by forcing your opponent to a position where they have no move. 42

Dependency Graph for Win -- win 43

Strata The stratum of an IDB predicate is the largest number of s on a path from that predicate, in the dependency graph. Examples: Stratum 1 monopoly -- win -- greenpath Infinite stratum Stratum 0 44

Stratified Programs If all IDB predicates have finite strata, then the Datalog program is stratified. If any IDB predicate has the infinite stratum, then the program is unstratified, and no stratified model exists. 45

Stratified Model Evaluate strata 0, 1, in order. If the program is stratified, then any negated IDB subgoal has already had its relation evaluated. Safety assures that we can subtract it from something. Treat it as EDB. Result is the stratified model. 46

Examples For Monopoly, greenpath is in stratum 0: compute it (the transitive closure of green). Then, monopoly is in stratum 1: compute it by taking the difference of red and greenpath. Result is first model proposed. Win is not stratified, thus no stratified model. 47