Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Similar documents
Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Datalog. Rules Programs Negation

CMPS 277 Principles of Database Systems. Lecture #11

Lecture 9: Datalog with Negation

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

CSE 344 JANUARY 26 TH DATALOG

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

CSE 344 JANUARY 29 TH DATALOG

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Database Theory: Beyond FO

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Datalog Recursive SQL LogicBlox

Data Integration: Logic Query Languages

I Relational Database Modeling how to define

I Relational Database Modeling how to define

Lecture 1: Conjunctive Queries

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Introduction to Data Management CSE 344. Lecture 14: Datalog

Database Theory: Datalog, Views

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Range Restriction for General Formulas

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

Outline. Implementing a basic general game player. Foundations of logic programming. Metagaming: rule optimisation. Logic Programming.

Query Minimization. CSE 544: Lecture 11 Theory. Query Minimization In Practice. Query Minimization. Query Minimization for Views

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra.

Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

Negation wrapped inside a recursion makes no. separated, there can be ambiguity about what. the rules mean, and some one meaning must

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Introduction to Data Management CSE 344. Lecture 14: Relational Calculus

CSE 544: Principles of Database Systems

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

Incomplete Information: Null Values

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Program Analysis in Datalog

Logical Aspects of Massively Parallel and Distributed Systems

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Database Management Systems CSEP 544. Lecture 3: SQL Relational Algebra, and Datalog

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

CSE 344 Final Review. August 16 th

Advances in Data Management Beyond records and objects A.Poulovassilis

Three Query Language Formalisms

Chapter 6: Bottom-Up Evaluation

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Chapter 5: Other Relational Languages

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

CSE 344 MAY 7 TH EXAM REVIEW

A Retrospective on Datalog 1.0

Query formalisms for relational model relational calculus

Datalog Evaluation. Serge Abiteboul. 5 mai 2009 INRIA. Serge Abiteboul (INRIA) Datalog Evaluation 5 mai / 1

Computability Theory XI

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Deductive Databases. Motivation. Datalog. Chapter 25

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Lecture 2: Game Description Language

1. Introduction; Selection, Projection. 2. Cartesian Product, Join. 5. Formal Definitions, A Bit of Theory

Introduction to Data Management CSE 344. Lecture 12: Relational Calculus

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Three Query Language Formalisms

Data Integration: Datalog

THE RELATIONAL MODEL. University of Waterloo

SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs

Constraint Solving. Systems and Internet Infrastructure Security

Relational Databases

NP and computational intractability. Kleinberg and Tardos, chapter 8

Database Systems CSE 414

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Relational Database: The Relational Data Model; Operations on Database Relations

Complexity of Answering Queries Using Materialized Views

CSE 344 Midterm Exam

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

An Extended Magic Sets Strategy for a Rule. Paulo J Azevedo. Departamento de Informatica Braga, Portugal.

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

Foundations of Databases

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

Advanced Database Systems

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Stream Joins and Dynamic Query Evaluation. WANG, Qichen HUANG, Ziyue

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Foundations of Databases

Database Systems. Basics of the Relational Data Model

Database Management Systems Paper Solution

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Deductive Approach for Programming Sensor Networks

Principles of Database Systems CSE 544. Lecture #2 SQL, Relational Algebra, Relational Calculus

OPTIMIZING RECURSIVE INFORMATION GATHERING PLANS. Eric M. Lambrecht

CSC A Hash-Based Approach for Computing the Transitive Closure of Database Relations. Farshad Fotouhi, Andrew Johnson, S.P.

CSC Discrete Math I, Spring Sets

arxiv: v1 [cs.db] 10 Dec 2018

Dynamically Ordered Semi-Naive Evaluation of Recursive Queries

Transcription:

Datalog Susan B. Davidson CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309 http://www.cis.upenn.edu/~susan/cis700/homepage.html 2017 A. Alawini, S. Davidson

Homework for this week Sign up to present a paper (the Google doc link was sent on Friday) Class schedule is being updated based on this. University of Pennsylvania 2

First paper summary due 2/5 First summary is on the following paper: Big Data Analytics with Datalog Queries on Spark SIGMOD 2016 What is a summary (print and bring to class)? Short paragraph describing paper 1-3 strengths, 1-3 weaknesses At least one question you have about the paper. University of Pennsylvania 3

Last time: Datalog Facts (EDB) and rules (IDB) Safe queries Negation can be tricky University of Pennsylvania 4

The Bachelor problem Suppose we have an EDB relation married(x,y) and want to calculate the bachelors. Not correct (and not safe): bachelor(y) :- NOT married(x,y) Also not correct (but safe): bachelor(y) :- person(x), person(y), NOT married(x,y) Correct (and safe): notbachelor(y):- married(x,y) notbachelor(x):- married(x,y) bachelor(y) :- person(y), NOT notbachelor(y) 5

This time: Datalog + with Recursion A simple recursive program and naïve evaluation Evaluating Datalog + programs Negation can still be tricky University of Pennsylvania 6

Datalog versus SQL Non-recursive Datalog with negation is a cleaned-up core of SQL Unions of conjunctive queries Forms the core of query optimization, what we know how to reason over easily You can translate easily between non-recursive Datalog with negation and SQL. Take the join of the nonnegated, relational subgoals and select/delete from there. University of Pennsylvania 7

Why Datalog? Recursion Rules express things that go on in both FROM and WHERE clauses, and let us state some general principles (e.g., containment of rules) that are almost impossible to state correctly in SQL. University of Pennsylvania 8

Simple recursive Datalog program R encodes a graph. What does T compute? R= 1 2 2 1 2 3 1 4 3 4 4 5 T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y) University of Pennsylvania 9

Naïve Evaluation T= {} WHILE (changes to T) DO T= T U (R(x,y) U (R(x,y) T(y,z)) ) 10

Simple recursive Datalog program R encodes a graph. R= 1 2 2 1 2 3 1 4 3 4 4 5 Alternate ways to compute transitive closure: T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y) T(x,y):- R(x,y) T(x,y):- T(x,z), R(z,y) T(x,y):- R(x,y) T(x,y):- T(x,z), T(z,y) Right linear Left linear Non-linear University of Pennsylvania 11

Another Interesting Program R encodes a graph. Non 2-colorability: 1 2 R= 2 1 ODD(x,y):- R(x,y) 2 3 1 4 3 4 4 5 ODD(x,y):- R(x,z), EVEN(z,y) EVEN(x,y):-R(x,z), ODD(z,y) Q:- ODD(x,x) University of Pennsylvania 12

Evaluating Datalog + Programs 1. Nonrecursive programs. 2. Naïve evaluation of recursive programs without negation. 3. Semi-naïve evaluation of recursive programs without negation. Eliminates some redundant computation. 13

Nonrecursive Evaluation If (and only if!) a Datalog program is not recursive, then we can order the IDB predicates so that in any rule for p (i.e., p is the head predicate), the only IDB predicates in the body precede p. 14

Why? Consider the dependency graph with: Nodes = IDB predicates. Arc p à q iff there is a rule for p with q in the body. Cycle involving node p means p is recursive. No cycles: use topological order to evaluate predicates. 15

Applying Rules To evaluate an IDB predicate p : 1. Apply each rule for p to the current relations corresponding to its subgoals. Apply = If an assignment of values to variables makes the body true, insert the tuple that the head becomes into the relation for p (no duplicates). Also think of the product of the relations corresponding to the subgoals with selection/join conditions 2. Take the union of the result for each p-rule. 16

Q={(1,2), (3,4)} Example R={(2,5),(4,9),(4,10), (6,7)} P(x,y) :- Q(x,z), R(z,y), y<10 Assignments making the body true: (x,y,z) = (1,5,2), (3,9,4) So P = {(1,5), (3,9)}. 17

Algorithm for Nonrecursive FOR (each predicate P in topological order) DO Apply the rules for P to previously computed relations to compute relation P; 18

Naïve Evaluation for Recursive make all IDB relations empty; WHILE (changes to IDB) DO FOR (each IDB predicate P) DO Evaluate P using current values of all relations; 19

Important Points As long as there is no negation of IDB subgoals, then each IDB relation grows, i.e., on each round it contains at least what it used to contain. monotonicity Since relations are finite, the loop must eventually terminate. Result is the least fixedpoint (minimal model ) of rules. 20

Problem with Naïve Evaluation The same facts are discovered over and over again. The semi-naïve algorithm tries to reduce the number of facts discovered multiple times. There is a similarity to incremental view maintenance University of Pennsylvania 21

Background: Incremental View Maintenance V(x,y):- R(x,z),S(z,y) If Rß R U ΔR then what is ΔV(X,Y)? ΔV(x,y):- ΔR(x,z),S(z,y) If Rß R U ΔR and Sß S U ΔS then what is ΔV(X,Y)? ΔV(x,y):- ΔR(x,z),S(z,y) ΔV(x,y):- R(x,z), ΔS(z,y) ΔV(x,y):- ΔR(x,z), ΔS(z,y) University of Pennsylvania 22

Background: Incremental View Maintenance V(x,y):- T(x,z),T(z,y) If Tß T U ΔT then what is ΔV(X,Y)? ΔV(x,y):- ΔT(x,z),T(z,y) ΔV(x,y):- T(x,z), ΔT(z,y) ΔV(x,y):- ΔT(x,z), ΔT(z,y) University of Pennsylvania 23

Semi-naïve Evaluation Key idea: to get a new tuple for relation P on one round, the evaluation must use some tuple for some relation of the body that was obtained on the previous round. Maintain ΔP = new tuples added to P on previous round. Differentiate rule bodies to be union of bodies with one IDB subgoal made Δ. 24

Semi-naïve Evaluation Separate the Datalog program into the non-recursive, and the recursive part. Each P i defined by non-recursive-spju i and (recursive-)spju i. P1 = ΔP1 = non-recursive-spju1, P2 = ΔP2 = non-recursive-spju2, Loop ΔP1 = Δ SPJU1 P1; ΔP2 = ΔSPJU2 P2; if (ΔP1 = and ΔP2 = and ) then break P1 = P1 ΔP1; P2 = P2 ΔP2; Endloop University of Pennsylvania 25

P1 = ΔP1 = non-recursive-spju1, P2 = ΔP2 = non-recursive-spju2, Loop Semi-naïve Evaluation ΔP1 = Δ SPJU1 P1; ΔP2 = ΔSPJU2 P2; if (ΔP1 = and ΔP2 = and ) then break P1 = P1 ΔP1; P2 = P2 ΔP2; Endloop T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y) T(x,y) = ΔT(x,y) =? (nonrecursive rule) Loop ΔT(x,y) =? (recursive Δ-rule) if (ΔT = ) then break T = T ΔT Endloop University of Pennsylvania 26

P1 = ΔP1 = non-recursive-spju1, P2 = ΔP2 = non-recursive-spju2, Loop Semi-naïve Evaluation ΔP1 = Δ SPJU1 P1; ΔP2 = ΔSPJU2 P2; if (ΔP1 = and ΔP2 = and ) then break P1 = P1 ΔP1; P2 = P2 ΔP2; Endloop T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y) T(x,y) = ΔT(x,y) = R(x,y) Loop ΔT(x,y) = (R(x,z), ΔT(z,y))- R(x,y) if (ΔT = ) then break T = T ΔT Endloop University of Pennsylvania 27

Discussion of Semi-Naïve Algorithm Avoids recomputing some (but not all) tuples Easy to implement, no disadvantage over naïve A rule is called linear if its body contains only one recursive IDB predicate: A linear rule always results in a single incremental rule A non-linear rule may result in multiple incremental rules University of Pennsylvania 28

Recursion and Negation Don t Like Each Other When rules have negated IDB subgoals, there can be several minimal models. Recall: model = set of IDB facts, plus the given EDB facts, that make the rules true for every assignment of values to variables. Rule is true unless body is true and head is false. Suppose R(a). What are S and T? S(x):- R(x), not T(x) T(x):- R(x), not S(x) 29

Next time: Datalog - University of Pennsylvania 30