Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Similar documents
Lecture 9: Datalog with Negation

Database Theory: Beyond FO

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

CMPS 277 Principles of Database Systems. Lecture #11

Datalog. Rules Programs Negation

Lecture 1: Conjunctive Queries

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

Datalog Recursive SQL LogicBlox

CSE 344 JANUARY 29 TH DATALOG

I Relational Database Modeling how to define

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

I Relational Database Modeling how to define

Datalog. S. Costantini

Disjunctive Logic Programming: Knowledge Representation Techniques, Systems, and Applications

Data Integration: Datalog

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

Knowledge-Based Systems and Deductive Databases

A Retrospective on Datalog 1.0

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

Chapter 16. Logic Programming. Topics. Unification. Resolution. Prolog s Search Strategy. Prolog s Search Strategy

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

Advances in Data Management Beyond records and objects A.Poulovassilis

Database Theory: Datalog, Views

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Program Analysis in Datalog

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Introduction to Data Management CSE 344. Lecture 14: Datalog

Range Restriction for General Formulas

Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Chapter 5: Other Relational Languages

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra.

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

Chapter 6: Bottom-Up Evaluation

CSE 344 APRIL 11 TH DATALOG

Data Integration: Logic Query Languages

CSEN403 Concepts of Programming Languages. Topics: Logic Programming Paradigm: PROLOG Search Tree Recursion Arithmetic

Foundations of Databases

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

CMPS 277 Principles of Database Systems. Lecture #4

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

University of Innsbruck

CSE 544: Principles of Database Systems

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 1. Textbook. Practicalities: assessment. Aims and objectives of the course

Other Relational Languages

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

Logical reasoning systems

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

Knowledge-Based Systems and Deductive Databases

CSC Discrete Math I, Spring Sets

Safe Stratified Datalog With Integer Order Does not Have Syntax

Datalog Evaluation. Serge Abiteboul. 5 mai 2009 INRIA. Serge Abiteboul (INRIA) Datalog Evaluation 5 mai / 1

Schema Mappings and Data Exchange

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

CPSC 304 Introduction to Database Systems

Constraint Propagation for Efficient Inference in Markov Logic

Outline. Implementing a basic general game player. Foundations of logic programming. Metagaming: rule optimisation. Logic Programming.

Three Query Language Formalisms

µz An Efficient Engine for Fixed Points with Constraints

Query Minimization. CSE 544: Lecture 11 Theory. Query Minimization In Practice. Query Minimization. Query Minimization for Views

Discrete Mathematics Lecture 4. Harper Langston New York University

Inconsistency-tolerant logics

CSE20: Discrete Mathematics

Query formalisms for relational model relational calculus

A Simple SQL Injection Pattern

Other Relational Languages

Week 7 Prolog overview

XSB. Tim Finin University of Maryland Baltimore County. What XSB offers Tabling Higher order logic programming. Negation. sort of.

Denotational Semantics. Domain Theory

Lecture 4: January 12, 2015

Foundations of Computer Science Spring Mathematical Preliminaries

SQL: Recursion. Announcements (October 3) A motivating example. CPS 116 Introduction to Database Systems. Homework #2 graded

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

CSE 344 JANUARY 26 TH DATALOG

Managing Inconsistencies in Collaborative Data Management

has to choose. Important questions are: which relations should be dened intensionally,

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

simplicity hides complexity flow of control, negation, cut, 2 nd order programming, tail recursion procedural and declarative semantics and & or

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

Chapter 3: Relational Model

Negations in Refinement Type Systems

Relational Databases

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Relational Database: The Relational Data Model; Operations on Database Relations

Automated Reasoning. Natural Deduction in First-Order Logic

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

A Formal Semantics for a Rule-Based Language

Deductive database systems and integrity constraint checking Seljée, R.R.

Transcription:

Plan of the lecture G53RDB: Theory of Relational Databases Lecture 14 Natasha Alechina School of Computer Science & IT nza@cs.nott.ac.uk More Datalog: Safe queries Datalog and relational algebra Recursive Datalog rules Semantics of recursive Datalog rules Problems with negation Stratified Datalog Lecture 17 2 Datalog syntax: rules A Datalog rule is an expression of the form R 1 R 2 AND AND R n where n 1, R 1 is a relational atom, and R 2,, R n are relational or arithmetic atoms, possibly preceded by NOT. R 1 is called the head of the rule and R 2,, R n the body of the rule. R 2,, R n are called subgoals. Example Suppose we have a relation Person over schema (Name, Age, Address, Telephone). Then the following Datalog rule will define a relation which contains names of people aged over 18: Adult(x) Person(x,y,z,u) AND y 18 Lecture 17 3 Lecture 17 4 Datalog query A Datalog query is a finite set of Datalog rules If there is only one relation which appears as a head of a rule in the query, the tuples in that relation are taken as the answer to the query. For example, Parent(x,y) Mother(x,y) Parent(x,y) Father(x,y) defines Parent relation (using relations Father and Mother) If there is more than one relation appearing as a head, one of them is the main predicate to be defined and others are auxiliary. Meaning of Datalog rules First approximation (non-recursive queries): take the values of variables which make the body of the rule true (make each subgoal true; NOT R is true if R is false) see what values the variables of the head take; add the resulting tuple to the predicate in the head of the rule. Lecture 17 5 Lecture 17 6 1

Example with negation Safe queries Suppose we have a relation Person over schema (Name, Age, Address, Telephone). Child(x) Person(x,y,z,u) AND NOT(y 18) We take all <name, age, addr, tel> in Person for which it is also true that NOT(age 18), and add <name> to Child. NOT(age 18) is true if age 18 is false, so we add all tuples where age < 18. We want the result of a query to be a finite relation. To ensure this, the following safety condition is required: every variable that appears anywhere in the rule must appear in some non-negated relational subgoal. The reason for this is that infinitely many values may satisfy an arithmetical subgoal (e.g. x > 0) and infinitely many values are NOT in some finite table of a relation R. Lecture 17 7 Lecture 17 8 Questions Questions Which of the following rules have safety violations: P(x,y) Q(x,y) AND NOT R(x,y) P(x,y) NOT Q(x,y) AND y = 10 P(x,y) Q(x,z) AND NOT R(w,x,z) AND x < y P(x,y) Q(x,z) AND R(z,y) AND NOT Q(x,y) Which tuples are in P? P(x,y) Q(x,z) AND R(z,y) AND NOT Q(x,y) given that: Q contains tuples <a,b>, <a,c> R contains tuples <b,c>, <c,a> Lecture 17 9 Lecture 17 10 Datalog and relational algebra Union Every relation definable in relational algebra is definable in Datalog. Again we assume that we have a relational name (predicate symbol) R for every basic relation R. Then for every operation of relational algebra, we show how to write a corresponding Datalog query. Union of R and S: U(x 1,,x n ) R(x 1,,x n ) U(x 1,,x n ) S(x 1,,x n ) Lecture 17 11 Lecture 17 12 2

Difference Difference of R and S: Product Product of R and S: D(x 1,,x n ) R(x 1,,x n ) AND NOT S(x 1,,x n ) P(x 1,,x n,y 1,,y k ) R(x 1,,x n ) AND S(y 1,,y k ) Lecture 17 13 Lecture 17 14 Projection Selection Suppose we want to project R on attributes x 1,,x n. P(x 1,,x n ) R(x 1,,x n,y 1,,y k ) or P(x 1,,x n ) R(x 1,,x n,_,.,_) Simple case: all conditions in the selection are connected by AND, for example σ Age > 18 AND Address = London (Person) Answer(x,y,z,u) Person(x,y,z,u) AND y > 18 AND z = London If conditions are connected with OR, need more than one rule. For example, σ Age > 18 OR Address = London (Person) Answer(x,y,z,u) Person(x,y,z,u) AND y > 18 Answer(x,y,z,u) Person(x,y,z,u) AND z = London Lecture 17 15 Lecture 17 16 Compound queries Recursion: motivating example To translate an arbitrary algebraic expression, create a new predicate for every node in the query tree. For example, to do σ Name1 = Name2 (R P): Define predicate S = R P Define σ Name1 = Name2 (S) Consider a database for London underground. It describes lines, stations, station closures etc. (there may be stations closed on weekends, or because of technical problems or strikes). Typical queries include: is it possible to go from King s Cross to Embankment? which lines can be reached form King s Cross? Lecture 17 17 Lecture 17 18 3

Motivating example Motivating example We can either compute and store this information for every station (recompute it every day because of station closures) Or, we can store the basic data (Links relation below) and compute answers to queries as they are asked. However, in a relational database, given a relation Links, we cannot express a query Is Pimlico reachable from Marble Arch?. Line Station Next Station Links Line Station Next Station Links Central Marble Arch Bond St Jubilee Bond St Green Park Victoria Green Park Victoria Victoria Victoria Pimlico Central Marble Arch Bond St Jubilee Bond St Green Park Victoria Green Park Victoria Victoria Victoria Pimlico Lecture 17 19 Lecture 17 20 Recursive queries Example recursive program Reachability in a graph is a typical recursive property. It cannot be expressed in relational calculus or relational algebra given an Edge relation for the graph. We can write a query which expresses reachable in one step, reachable in two steps, and so on, but not simply reachable. Another example: given a Parent relation, write a query which finds ancestors of a given person. Again, in relational algebra or calculus we can find parents, grandparents and so on, but not all ancestors. Reachable (x,x) Reachable (x,y) Links(u,z,y) AND Reachable (x,z) We use the database relation Links to define relation Reachable, which is not stored in the database. To compute the set of stations reachable from King s Cross, we add to this program Answer(y) Reachable( King s Cross, y) Lecture 17 21 Lecture 17 22 Extensional and intensional predicates To distinguish relations which are in the database and relations which are being defined by Datalog rules: Extensional predicates: predicates whose relations are stored in a database Intensional predicates: defined by Datalog rules EDB extensional database collection of extensional relations IDB intensional database collection of intensional relations Three ways to give semantics of recursive Datalog programs Minimal relations (minimal models) Provability semantics Fixpoint semantics For the time being, assume that we do not have negation on IDB predicates Lecture 17 23 Lecture 17 24 4

Minimal relations Example Datalog programs are logical descriptions of new relations. The answer to the Datalog query is the smallest relation which satisfies all the stated properties. Each rule R 1 (xs) R 2 (xs) AND AND R n (xs) corresponds to a logical property x 1... x m (R 2 (xs)& &R n (xs ) R 1 (xs)) where x 1,...,x m are all the variables occurring in the rule and xs some subsequence of x 1,...,x m. A program Ancestor(x,y) Parent(x,y) Ancestor(x,y) Parent(x,z) AND Ancestor(z,y) corresponds to logical properties P1 x y (Parent(x,y) Ancestor(x,y)) P2 x y z(parent(x,z) & Ancestor(z,y) Ancestor(x,y)) Lecture 17 25 Lecture 17 26 Example Programs as proofs Suppose Parent contains just two pairs: Parent(Anne, Bob), Parent(Bob, Chris) Because of P1, Ancestor should contain the same pairs: Ancestor(Anne, Bob), Ancestor(Bob, Chris) Because of P2, we also need to add Ancestor(Anne,Chris) to satisfy x y z(parent(x,z) & Ancestor(z,y) Ancestor(x,y)) Parent(Anne,Bob) & Ancestor(Bob,Chris) Ancestor(Anne,Chris)) Proof-theoretic way of looking at Datalog programs: for which tuples can we logically prove that they are in Ancestor relation (using Parent relation and the program rules). Happens to be the same tuples as in the minimal Ancestor relation. Lecture 17 27 Lecture 17 28 Fixpoint semantics of programs Start assuming that all IDB predicates are empty. Construct larger and larger IDB relations by: Fire rules to add a tuples to IDB relations Use tuples added to IDB relations in the previous round to add a new tuples to IDB relations Continue firing rules until no new tuples are added (reached a fixpoint). If rules are safe, there will be finitely many tuples which satisfy the body of the rule, so fixpoint will be reached after finitely many rounds. This happens to give the same answer as what is the minimal relation satisfying the properties and for which tuples can we prove that they are in Ancestor relation. Lecture 17 29 Example: fixpoint construction Ancestor(x,y) Parent(x,y) Ancestor(x,y) Parent(x,z), Ancestor(z,y) Start: Ancestor = {}, Parent={<a,b>,<b,c>,<c,d>} 1 st round: Ancestor = {<a,b>,<b,c>,<c,d>} 2 nd round: Ancestor = {<a,b>,<b,c>,<c,d>, <a,c>, <b,d>} (Ancestor(a,c) Parent(a,b), Ancestor(b,c) gives <a,c> Ancestor(b,d) Parent(b,c), Ancestor(c,d) gives <b,d>) 3 rd round: Ancestor = {<a,b>,<b,c>,<c,d>, <a,c>,<b,d>, Ancestor(a,d) Parent(a,b), Ancestor(b,d) <a,d>} 4 th round: no new tuples in Ancestor. Lecture 17 30 5

Negation Stratified Datalog with negation Problem with negation: may not be a unique minimal solution; no clear semantics. Example: EDB = {R} and IDB = {P,Q} P(x) R(x) AND NOT Q(x) Q(x) R(x) AND NOT P(x) Suppose R = {<a>}. Then either P = {<a>} and Q empty, or Q = {<a>} and P empty. No unique solution. Can t say if P(a) holds or Q(a) holds. The idea is to break cycles as in the example before, when to evaluate IDB predicate P we need to know what is the negation of IDB predicate Q, and vice versa (P is defined using NOT Q and Q is defined using NOT P). Solution: outlaw cycles in dependencies on negative IDB predicates. Lecture 17 31 Lecture 17 32 What does depend mean What does depend mean If R is the head of a rule where P is in the body, R depends on P If R is the head of a rule where P is in the body, and P depends on S, then R depends on S (transitive relation). We draw a dependency graph for IDB predicates. Lecture 17 33 (Only IDB predicates are shown, E assumed to be an EDB predicate). R depends on P, S, Q, T and V; P depends on S; Q depends on V and T; V depends on T and Q, and T depends on Q and V R(x) P(x) AND NOT Q(x) Q(x) NOT V(x) AND E(x) P(x) NOT S(x) AND E(x) V(x) T(x) T(x) Q(x) R P S Q V Lecture 17 34 T What does depend mean Strata Negative arcs (with sign) correspond to negative occurrences of predicates in the body of the rule Recursion is stratified if there is no cycle involving negative arcs. (The program below is not stratified) R(x) P(x) AND NOT Q(x) Q(x) NOT V(x) AND E(x) P(x) NOT S(x) AND E(x) V(x) T(x) T(x) Q(x) R P S Q V Lecture 17 35 bad cycle T In a stratified program, IDB predicates are divided into strata. Stratum of a predicate is the maximal number of negative arcs on a dependency path starting at that predicate. R(x) P(x) AND NOT Q(x) Q(x) NOT V(x) AND E(x) P(x) NOT S(x) AND E(x) R P S Q V Lecture 17 36 6

Example In other words The program below is stratified. Stratum 0 = {S, V} Stratum 1 = {P, Q} Stratum 2 = {R} R(x) P(x) AND NOT Q(x) R Q(x) NOT V(x) AND E(x) P(x) NOT S(x) AND E(x) Q P V S Lecture 17 37 stratum 0: do not depend on any negated IDB predicates stratum 1: depend on negated IDB predicates from stratum 0; stratum 2: depend on negated IDB predicates from stratum 1, stratum n: depend on IDB predicates from stratum n-1. Lecture 17 38 Evaluating stratified Datalog programs Stratified Datalog programs have the following operational semantics: First compute all IDB predicates in stratum 0 (using the usual fixpoint strategy) Using IDB predicates from stratum n, compute IDB predicates from stratum n+1. This produces unique minimal solutions for all IDB predicates. Lecture 17 39 Informal coursework Is the following program stratified (EDB = {S}): Q(x) NOT P(x) AND R(x) P(x) NOT R(x) AND S(x) R(x) S(x) Is the following program stratified (EDB = {S}): R(x) Q(x) Q(x) R(x) R(x) S(x) AND NOT Q(x) For the stratified program, compute P, Q and R given that S contains {<a>,<b>}. Lecture 17 40 Informal coursework A database of fictitious company contains three relations: GOODS over schema {Producer, ProductCode, Description} DELIVERY over schema {Producer, ProductCode, Branch#, Stock#} STOCK over schema {Branch#, Stock#, Size, Colour, SellPrice, CostPrice, DateIn, DateOut}. Lecture 17 41 Define in Datalog Query 1: find all producers who supply goods. Query 2: find all producers who have delivered goods to any branch of the company. Query 3: find SellPrice and CostPrice of all goods delivered to branch L1 still in stock (here, L1 is a value in the attribute domain of Branch#, and products in stock have value InStock for the DateOut attribute). Query 4: find Producer, ProductCode, Description for all goods sold at the same day they arrived at any branch. Query 5: find Branch#, Size, Colour, SellPrice for all dresses which have not yet been sold (dress is a value in the attribute domain of Description). Lecture 17 42 7

Reading Ullman, Widom, chapter 10 Abiteboul, Hull, Vianu chapter 12. Lecture 17 43 8