Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Similar documents
Foundations of Databases

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Datalog Evaluation. Serge Abiteboul. 5 mai 2009 INRIA. Serge Abiteboul (INRIA) Datalog Evaluation 5 mai / 1

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

Data Integration: Datalog

D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine

Chapter 6: Bottom-Up Evaluation

Database Theory: Beyond FO

Program Analysis in Datalog

Implementation Techniques

Deductive Databases. Motivation. Datalog. Chapter 25

An Extended Magic Sets Strategy for a Rule. Paulo J Azevedo. Departamento de Informatica Braga, Portugal.

CMPS 277 Principles of Database Systems. Lecture #11

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Ontology and Database Systems: Foundations of Database Systems

Learning Rules. Learning Rules from Decision Trees

CSE 344 JANUARY 26 TH DATALOG

CS521 \ Notes for the Final Exam

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

Lecture 9: Datalog with Negation

A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries

Range Restriction for General Formulas

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

( D. Θ n. ( ) f n ( ) D. Ο%

University of Cape Town

Datalog Recursive SQL LogicBlox

Efficiently Computing Provenance Graphs for Queries with Negation

Expressive capabilities description languages and query rewriting algorithms q

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Algorithms for Finding Dominators in Directed Graphs

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

The NEXT Framework for Logical XQuery Optimization

Safe Stratified Datalog With Integer Order Does not Have Syntax

n 2 C. Θ n ( ) Ο f ( n) B. n 2 Ω( n logn)

( ) ( ) C. " 1 n. ( ) $ f n. ( ) B. " log( n! ) ( ) and that you already know ( ) ( ) " % g( n) ( ) " #&

Prolog Programming. Lecture Module 8

Implementação de Linguagens 2016/2017

( ) 1 B. 1. Suppose f x

Lecture 1: Conjunctive Queries

A Retrospective on Datalog 1.0

Datalog. Rules Programs Negation

Arbori Starter Manual Eugene Perkov

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

CSE 344 JANUARY 29 TH DATALOG

CS 310 Advanced Data Structures and Algorithms

Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures

yqgm_std_rules documentation (Version 1)

Access Patterns (Extended Version) Chen Li. Department of Computer Science, Stanford University, CA Abstract

XDO2: AN XML DEDUCTIVE OBJECT- ORIENTED QUERY LANGUAGE

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Distributed RDFS Reasoning Over Structured Overlay Networks

Mining Frequent Patterns without Candidate Generation

Query Processing & Optimization

COMPILATION AND EVALUATION OF NESTED LINEAR RECURSIONS: A DEDUCTIVE DATABASE APPROACH

Lecture 3: Graphs and flows

QueryPIE: Hybrid Reasoning With The OWL RL Rules

Pushing Semantics inside Recursion: A General Framework for. Semantic Optimization of Recursive Queries

Logic Programming and Resolution Lecture notes for INF3170/4171

contribution of this paper is to demonstrate that rule orderings can also improve eciency by reducing the number of rule applications. In eect, since

Posets, graphs and algebras: a case study for the fine-grained complexity of CSP s

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο

6.001 Notes: Section 4.1

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Knowledge Representation. CS 486/686: Introduction to Artificial Intelligence

Constraint Solving. Systems and Internet Infrastructure Security

Negation wrapped inside a recursion makes no. separated, there can be ambiguity about what. the rules mean, and some one meaning must

Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

Query Evaluation Strategies

QueryPIE: Hybrid Reasoning With The OWL RL Rules

Trees and Tree Traversal

Query Containment for Data Integration Systems

Knowledge Representation and Reasoning Logics for Artificial Intelligence

n 2 ( ) ( ) Ο f ( n) ( ) Ω B. n logn Ο

Answering Queries with Useful Bindings

Graph Algorithms Using Depth First Search

Dynamically Ordered Semi-Naive Evaluation of Recursive Queries

Database Theory: Datalog, Views

Parser: SQL parse tree

Knowledge-Based Systems and Deductive Databases

Monadic Datalog Containment on Trees

Logic: TD as search, Datalog (variables)

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

SORTING AND SELECTION

A NETWORK OF COMMUNICATING LOGIC PROGRAMS AND ITS SEMANTICS. Susumu Yamasaki. Department of Information Science, Kyoto University Sakyo, Kyoto, Japan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

CS261: Problem Set #1

Relational Databases

11/6/17. Outline. FP Foundations, Scheme. Imperative Languages. Functional Programming. Mathematical Foundations. Mathematical Foundations

Faster and Dynamic Algorithms For Maximal End-Component Decomposition And Related Graph Problems In Probabilistic Verification

An introduction to logic programming with Prolog

Element Algebra. 1 Introduction. M. G. Manukyan

This lecture. Lecture 6: Search 5. Other Time and Space Variations of A* Victor R. Lesser. RBFS - Recursive Best-First Search Algorithm

Transcription:

Datalog Evaluation Linh Anh Nguyen Institute of Informatics University of Warsaw

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [2/64] Linh Anh Nguyen Datalog Evaluation

Simple Evaluation Algorithms Methods to evaluate Datalog program P on database instance I, derived from the different equivalent definitions of the semantics: Model-theoretic definition: Enumerate all subsets J B(P, I) and check modelhood pick smallest such J. Fixpoint definition: Augment I using operator T P until a fixpoint is reached. Proof-theoretic definition: Use SLD-resolution (bottom-up or top-down) [3/64] Linh Anh Nguyen Datalog Evaluation

Classes of Datalog Evaluation Two major classes of evaluation approaches: Bottom-Up, Forward Chaining: Proceed in the proof tree from the leaves to the root Apply the datalog rules from body to head (forward) Top-Down, Backward Chaining: Proceed in the proof tree from the root to the leaves Apply the datalog rules from head to body (backward) [4/64] Linh Anh Nguyen Datalog Evaluation

Naive Evaluation Follow the bottom-up approach Compute the minimum fixpoint of T P containing I (= T ω P (I)) Given datalog program P, database instance I 1 Start by assuming all idb relations are empty. 2 Repeatedly evaluate the rules using the edb and the previous idb, to get a new idb. 3 End when no change to idb. Disadvantages Relations have to be computed always from scratch, Relations must be copied, Iteration on all relations, even if they do not recursively depend on each other. [5/64] Linh Anh Nguyen Datalog Evaluation

Semi-naive Evaluation Since the edb never changes, on each round we only get new idb tuples if we use at least one idb tuple that was obtained on the previous round. Saves work, lets us avoid rediscovering most known facts. [6/64] Linh Anh Nguyen Datalog Evaluation

Adornment An adornment for an m-ary predicate p is a string α of length m made up of b (bound) and f (free), let p α be the predicate p adorned by α. The general algorithm for adorning a rule (i) All occurrences of each bound variable in the rule head are bound; (ii) All occurrences of constants are bound; (iii) If a variable X occurs in the rule body, then all occurrences of X in subsequent literals are bound; (iv) The remaining occurrences of variables are free. A different ordering of the rule body would yield different adornments. [7/64] Linh Anh Nguyen Datalog Evaluation

Adornment - Example Consider the following positive logic program P: r 1 : ancestor(x, y) parent(x, y) r 2 : ancestor(x, y) parent(x, z), ancestor(z, y) where x, y, z are variables parent is an extensional predicate, ancestor is an intensional predicate, parent(x, y) means x is a parent of y, ancestor(x, y) means x is an ancestor of y. Let the query be ancestor(john, y)?, asking John is an ancestor of whom?. The task is to find all the descendants of John. [8/64] Linh Anh Nguyen Datalog Evaluation

Adornment - Example Consider the following positive logic program P: r 1 : ancestor(x, y) parent(x, y) r 2 : ancestor(x, y) parent(x, z), ancestor(z, y) The adorned version (denoted by P ad ) of the program P and the query ancestor(john, y)?: r 1 : ancestor bf (x, y) parent(x, y) r 2 : ancestor bf (x, y) parent(x, z), ancestor bf (z, y) query f (y) ancestor bf (John, y) [9/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [10/64] Linh Anh Nguyen Datalog Evaluation

Query-Sub-Query Recursive (QSQR) Top-down, direct evaluation; Avoid the calculation of tuples that are not used for deriving answer; Begin with constants in a query pushing them from goals to subgoals; Use sideways information passing to pass constant binding information from one atom to the next in subgoals. [11/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example Reconsider the following adorned program P ad : r 1 : ancestor bf (x, y) parent(x, y) r 2 : ancestor bf (x, y) parent(x, z), ancestor bf (z, y) query f (y) ancestor bf (John, y) [12/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [13/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [14/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [15/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [16/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [17/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [18/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [19/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [20/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [21/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [22/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [23/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [24/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [25/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [26/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [27/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [28/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [29/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [30/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [31/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [32/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [33/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [34/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [35/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique The Magic-Set technique is a rule-rewriting method that generates from a given set of rules a new set of rules, which is equivalent to the original set w.r.t. the original query. After rewriting, the new program can be evaluated by a simple bottom-up algorithm, usually the (improved) semi-naive evaluation method. This method takes advantages of reducing irrelevant facts and restricting the search space. It combines the pros of top-down and bottom-up methods. The Generalized Supplementary Magic Sets algorithm uses some special predicates called supplementary magic predicates in order to eliminate the duplicate work during the processing. [36/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example The Magic-Set rule-rewriting corresponding to the adorned program P ad (denoted by P mg ): magic ancestor bf (z) magic ancestor bf (x), parent(x, z) ancestor bf (x, y) magic ancestor bf (x), parent(x, y) ancestor bf (x, y) magic ancestor bf (x), parent(x, z), ancestor bf (z, y) magic ancestor bf (John). [37/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Applying improved semi-naive evaluation method for P mg : iteration 1: magic ancestor bf (Ruth), magic ancestor bf (Lois) added. iteration 2: magic ancestor bf (Andy), magic ancestor bf (Mark) added. iteration 3: ancestor bf (John, Lois), ancestor bf (John, Ruth), ancestor bf (Lois, Andy), ancestor bf (Lois, Mark), ancestor bf (John, Andy), ancestor bf (John, Mark) added. iteration 4: fixpoint (no more tuples were added). [38/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Generalized Supplementary Magic Sets (denoted by P gmg ): sup magic2 2(x, z) magic ancestor bf (x), parent(x, z) ancestor bf (x, y) magic ancestor bf (x), parent(x, y) ancestor bf (x, y) magic ancestor bf (z) sup magic2 2(x, z), ancestor bf (z, y) sup magic2 2 (x, z) magic ancestor bf (John). [39/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Applying improved semi-naive evaluation method for P gmg : iteration 1: sup magic2 2(John, Ruth), sup magic2 2 (John, Lois), magic ancestor bf (Ruth), magic ancestor bf (Lois) added. iteration 2: sup magic2 2(Lois, Andy), sup magic2 2 (Lois, Mark), magic ancestor bf (Andy), magic ancestor bf (Mark) added. iteration 3: ancestor bf (John, Lois), ancestor bf (John, Ruth), ancestor bf (Lois, Andy), ancestor bf (Lois, Mark), ancestor bf (John, Andy), ancestor bf (John, Mark) added. iteration 4: fixpoint (no more tuples were added). [40/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [41/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Make a QSQ-net structure from a program P and use it as a flow control network to choose the processing order of transferring data, in an efficient way. The intention is to increase efficiency of query processing by: eliminating redundant computation, increasing flexibility, reducing the number of accesses to the secondary storage. The framework forms a generic evaluation method called QSQN. It has the following nice properties: the approach is goal-directed, each subquery is processed only once, each supplement tuple, if desired, is transferred only once, operations are done set-at-a-time, any control strategy can be used. [42/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Definition 1. QSQ-net Structure A QSQ-net structure of a positive logic program P is a tuple (V, E, T ) such that: V is a set of nodes, E is a set of edges, T is a function, called the memorizing type of the net structure. We call the pair (V, E) the QSQ-net topological structure of P. Definition 2. QSQ-net A QSQ-net of P is a tuple N = (V, E, T, C) such that: (V, E, T ) is a QSQ-net structure of P, C is a mapping that associates each node v V with a structure call the content of v. [43/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Definition 1. QSQ-net Structure A QSQ-net structure of a positive logic program P is a tuple (V, E, T ) such that: V is a set of nodes, E is a set of edges, T is a function, called the memorizing type of the net structure. We call the pair (V, E) the QSQ-net topological structure of P. Definition 2. QSQ-net A QSQ-net of P is a tuple N = (V, E, T, C) such that: (V, E, T ) is a QSQ-net structure of P, C is a mapping that associates each node v V with a structure call the content of v. [43/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets A subquery is a pair of the form (t, δ), where t is a generalized tuple and δ is an idempotent substitution such that dom(δ) Vars(t) =. Subqueries are transferred through edges and processed at nodes. Formally, the processing of a subquery has following properties: every subquery / input tuple / answer tuple subsumed by another one is ignored; every subquery / input tuple / answer tuple with term-depth greater than a fixed bound L is ignored; the processing is divided into smaller steps which can be delayed to maximize flexibility and allow various control strategies; the processing is done set-at-a-time (e.g., for all the unprocessed subqueries accumulated in a given node). [44/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example Reconsider the program P: ancestor(x, y) parent(x, y) ancestor(x, y) parent(x, z), ancestor(z, y). The QSQ-net topological structure of P is constructed as follows: [45/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [46/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [47/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [48/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [49/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [50/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [51/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [52/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [53/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Control strategies are used: Disk Access Reduction (DAR), which tries to reduce the number of accesses to the secondary storage; Depth-First Search (DFS), which gives priority to the order of clauses in the positive logic program defining intensional predicates and thus allows the user to control the evaluation to a certain extent; Improved Depth-First Control Strategy (IDFS), which is an improved version of DFS and the aim is to accumulate as many as possible tuples or subqueries at each node of the QSQ-net before processing it. [54/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example Reconsider the program P: ancestor(x, y) parent(x, y) ancestor(x, y) parent(x, z), ancestor(z, y). The query: ancestor(john, y)? [55/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [56/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [57/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [58/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [59/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [60/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [61/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [62/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [63/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example At this point, some edges are active without affecting to the ans ancestor relation. When all the attributes unprocessed, unprocessed subqueries, unprocessed subqueries 2 and unprocessed tuples of the nodes in the net are empty sets, the algorithm terminates and returns the set tuples(ans ancestor) = {(John, Lois), (John, Ruth), (John, Mark), (John, Andy)}. for the query ancestor(john, y)? [64/64] Linh Anh Nguyen Datalog Evaluation