}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

Similar documents
}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Foundations of Databases

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

LOGIC AND DISCRETE MATHEMATICS

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Relational Database: The Relational Data Model; Operations on Database Relations

Deductive Databases. Motivation. Datalog. Chapter 25

Datalog Evaluation. Serge Abiteboul. 5 mai 2009 INRIA. Serge Abiteboul (INRIA) Datalog Evaluation 5 mai / 1

µz An Efficient Engine for Fixed Points with Constraints

A Logic Database System with Extended Functionality 1

CMPS 277 Principles of Database Systems. Lecture #11

Relational Databases

Data Integration: Datalog

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

A Retrospective on Datalog 1.0

Data Integration: Logic Query Languages

Constraint Solving. Systems and Internet Infrastructure Security

Module 9: Selectivity Estimation

Range Restriction for General Formulas

Part V. Working with Information Systems. Marc H. Scholl (DBIS, Uni KN) Information Management Winter 2007/08 1

Implementação de Linguagens 2016/2017

Database Theory: Datalog, Views

Database Theory: Beyond FO

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

2.2.2.Relational Database concept

Johann Eder. Universitat Klagenfurt, Institut fur Informatik. calculus - are actually a subset of rst order predicate logic.

Lecture 1: Conjunctive Queries

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Implementation Techniques

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

University of Cape Town

evaluation using Magic Sets optimization has time complexity less than or equal to a particular

Integrating Datalog and Constraint Solving

Query Decomposition and Data Localization

An Extended Magic Sets Strategy for a Rule. Paulo J Azevedo. Departamento de Informatica Braga, Portugal.

THE RELATIONAL MODEL. University of Waterloo

Recursive query facilities in relational databases: a survey

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

Part I Logic programming paradigm

Chapter 6: Bottom-Up Evaluation

Announcements. CSCI 334: Principles of Programming Languages. Exam Study Session: Monday, May pm TBL 202. Lecture 22: Domain Specific Languages

Chapter 5: Other Relational Languages

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Mathematical Logic Prof. Arindama Singh Department of Mathematics Indian Institute of Technology, Madras. Lecture - 37 Resolution Rules

Screaming Fast Declarative Pointer Analysis

A Simple SQL Injection Pattern

Optimizing Recursive Queries in SQL

Principles of Data Management. Lecture #12 (Query Optimization I)

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Logic and its Applications

SAT solver of Howe & King as a logic program

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Query Containment for Data Integration Systems

Semantic Subtyping. Alain Frisch (ENS Paris) Giuseppe Castagna (ENS Paris) Véronique Benzaken (LRI U Paris Sud)

Term Algebras with Length Function and Bounded Quantifier Elimination

D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine

The Metalanguage λprolog and Its Implementation

Query Processing SL03

Principles of Programming Languages

The CORAL Deductive System

Element Algebra. 1 Introduction. M. G. Manukyan

Datalog Recursive SQL LogicBlox

Foundations of SPARQL Query Optimization

Operational Semantics

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Optimization of Nested Queries in a Complex Object Model

Other Relational Languages

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

A Parameterised Module System for Constructing Typed Logic Programs

CMP-3440 Database Systems

Boolean Functions (Formulas) and Propositional Logic

Information Systems (Informationssysteme)

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

CSE 544: Principles of Database Systems

Evaluating XPath Queries

Prolog-2 nd Lecture. Prolog Predicate - Box Model

EECS 219C: Formal Methods Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley

Choice Logic Programs and Nash Equilibria in Strategic Games

Evaluation of SPARQL Property Paths via Recursive SQL

Logical reasoning systems

Directed Graphical Models (Bayes Nets) (9/4/13)

OPTIMIZING RECURSIVE INFORMATION GATHERING PLANS. Eric M. Lambrecht

Index-Driven XQuery Processing in the exist XML Database

Denotational Semantics. Domain Theory

Querying Complex Graphs

Software Paradigms (Lesson 7) Logic Programming & Software Engineering

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Lecture 6: Arithmetic and Threshold Circuits

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

Keyword query interpretation over structured data

Transcription:

Module 11: Optimization of Recursive Queries Module Outline 11.1 Formalisms for recursive queries 11.2 Computing recursive queries 11.3 Partial transitive closures User Query Transformation & Optimization Internal Representation algebraic Logical & Physical DB Schema }Optimization non algebraic Decomposition into Simple Parts Access Path Selection DB Catalog (Statistics, Cost Parameters) Iterative Program DB Access at run time 343

11.1 Formalisms for recursive queries Examples for problems requiring recursion: Ancestors: given relation par(p, c) find all the ancestors of person X. (Reachability in or transitive closure of a digraph) Parts explosion: given relation compound(super, sub, count), compute a complete bill of materials needed to produce one part P. (dto., edges labelled, with computation along edges/paths) Path queries: given relation edge(from, to, distance), compute shortest path from A to B. (dto., with an optimization problem) 344

11.1.1 Datalog One option for expressing recursive queries is to use DATALOG as the query language. Characteristics of DATALOG are DATALOG is a subset of 1PL (Horn clauses). Horn clauses: CNF formulae with at most one positive literal,e.g. F = (A B) ( C A D) ( A B) D.... rewritten as implications F (B A) (C A D) (A B 0) (1 D).... and finally, in DATALOG notation (cf. PROLOG) A B, D C A, A B, D or, actually A : B. D : C, A. : A, B. D. 345

The ancestor problem in DATALOG: (retrieve all the ancestors of john ) anc(a, C) : par(a, C). anc(a, C) : anc(a, P ), par(p, C). query(x) : anc(x, john). The same generation problem in DATALOG: (retrieve all persons in the same generation with john ) sg(x, Y ) : par(xp, X), sg(y P, XP ), par(y P, Y ). sg(x, X). query(x) : sg(john, X). 346

11.1.2 Relational algebra with fixed-point operator Relational algebra can be extended by a least-fixed-point operator LF P, that computes the least fixed point of a recursive (algebraic) equation (of the general form x = f (x)). N.B. There are certain restrictions w.r.t. the algebra operators used in a recursive equation, such that the expressions are monotonic, so as to make sure that a (unique) least fixed point exists. (For example: no differences or, in terms of DATALOG: no negation.) Example: the ancestor problem Using the binary operator as a sequence of a join and a projection, e.g., anc(a, C) par(p, C) = π anc.a,par.c (anc anc.c=par.p par), we can express recursive queries over the anc relation as: (retrieve all ancestors of john ) query(x) = σ C=john (LF P ( anc = anc par par )). }{{} rec. equation, whose LFP is anc 347

Informal definition of the LF P operator: Iterate the following computation, until no more new tuples are found. anc := par par par (par par) par ((par par) par) par... Since is monotonic and par is finite, termination is guaranteed! We see that or anc = par + = par i i=1 anc = LF P (x = x par par). 348

11.1.3 Recursive SQL queries Since the 1999 version, the SQL standard contains recursive unions as a means to express recursion in SQL. The idea essentially follows the DATALOG approach, where one clause defines the initialization (anc(a, C) : par(a, C).) and another one the recursive step (anc(a, C) : anc(a, P ), par(p, C).). I Recursive union in SQL:1999 with recursive table (attr 1,...,attr n) as ( SFW Statement 1 /* initialization */ union all SFW Statement 2 ) /* recursive step */ select [distinct]... from recursive table [where... ] [group by... [having... ] ] [order by... ] 349

Example: the ancestor problem (retrieve all the ancestors of john ) with anc (A,C) as ( select P,C from par union all select anc.a, par.c from anc, par where anc.c=par.p ) select A from anc where C= john Remark: While LFP-algebra and DATALOG (due to lack of arithmetics) can express reachability queries only, SQL can also express path queries with computation and some optimization. 350

11.2 Computing recursive queries 11.2.1 Top-down evaluation (depth first search) A DATALOG program can be evaluated just like any other PROLOG program 1 following the top-down, left-to-right search strategy with backtracking: Horn clauses are considered ordered (from top to bottom), as well as the conjuncts in their righthand sides (left-to-right): Given a goal (such as find A, such that anc(a, john)), the evaluation tries to match the left-hand sides of all clauses (starting top-down) against the goal ( unification ), and then tries to satisfy the conjuncts of matching right-hand sides (sub-goals), proceeding left-to-right. If successful, display variable bindings found (as one result tuple) and wait for user to ask for more. If unsuccessful (or user asks for more), use backtracking to reverse search and try next possibility starting with last decision. 1 syntactically, DATALOG is a subset of PROLOG 351

Visualization using Warren s Abstract Machine (WAM) Use boxes to represent evaluation of sub-goals: CALL EXIT FAIL REDO CALL This is the first invocation of the sub-goal, to search for satisfying variable bindings. REDO This is for subsequent calls to find alternative bindings. EXIT This exit returns successful bindings. FAIL This exit signals unsuccessful search. The overall search strategy can be visualized by connecting those boxes according to the initial goal queried and the clauses used. Consider the following example ( geschwister =siblings, elternteil =parent): sibling(x, Y ) : par(p, X), par(p, Y ), X Y. 352

?- geschwister(erika,x). geschwister(k1,k2) :- elternteil(x,k1), elternteil(x,k2), K1 \= K2. elternteil(x1,erika)? elternteil( jens,erika). (K1) elternteil(x1,erika)? elternteil( anna,erika). (K1) elternteil(jens,k2)? elternteil(jens,k2)? FAIL elternteil(jens,k2)? elternteil(anna,k2)? elternteil(jens, elke). (K2,X) elternteil(jens, helga). (K2,X) elternteil(jens, erika). (K2,X) elternteil(anna, elke). (K2,X) erika \= elke? erika \= helga? erika \= erika? erika \= elke? true FAIL true FAIL FAIL true FAIL X = elke ; X = helga ; X = elke ; Observation: Processing proceeds one-record-at-a-time. However, only those records are considered, that really contribute to the result. 353

11.2.2 Bottom-up evaluation (breadth first search) The second statement in the loop exploits set-oriented relational processing (esp. joins). 354 In a database context, evaluation using an iteration scheme similar to the informal definition shown above can be promising, since it exploits the set-oriented capabilities of relational query processors (within each iteration step, a join is computed). This evaluation strategy is known as (semi-) naive evaluation or (delta-) iteration in the literature. Semi-Naive Iteration (for the whole ancestor relation) anc 0 := ; 1 := par; i := 1; repeat anc i := anc i 1 i ; i+1 := ( i par) anc i until i+1 =

11.3 Partial transitive closures With the least-fixed-point algebra as well as the bottom-up evaluation strategy, we re running into trouble, once we take into account that queries typically ask for ancestors of one (or possibly a few) person(s), not for the complete anc-relation. Recall the LFP-algebra expression mentioned above: query(x) = σ C=john (LF P (anc = anc par par)). Problem: to move the selection σ C= john inside the LFP-operator, we need new algebraic equivalence rules involving σ, LF P,. This is indeed possible and yields: query(x) = LF P (E = E par σ C=john par)), with E = σ C=john anc, which, in terms of DATALOG, corresponds to: anc(john, C) : par(john, C). anc(john, C) : anc(john, P ), par(p, C). query(x) : anc(john, X). N.B. The selection condition has been moved into all iteration steps. 355

11.3.1 Magic Set rewriting Goal: devise a (DATALOG) query rewriting method that propagates selections through recursion. Problem: selection predicate changes with each step of the recursion (or iteration), to reflect newly obtained interesting values (the front of the search). Idea: introduce new predicates and rules that collect relevant variable bindings for free variables. Those are called Magic Predicates. Example: reconsider the same generation problem mentioned above: sg(x, Y ) : par(xp, X), sg(y P, XP ), par(y P, Y ). sg(x, X). query(x) : sg(john, X). Starting from the query given, mark predicates, according to which of their variables are free/bound and add magic predicates to propagate bindings.! Restriction: Magic set rewriting only works for linear recursion! 356

Definitions 1 Adornment of a predicate: A string of f and b attached to the predicate symbol. The length of that string equals the number of parameters of the predicate. 2 Distinguished Argument of a predicate: a) a constant, b) a variable marked b (in the head predicate 2 of a rule), or c) a variable occuring in a base predicate 3 that has a distinguished argument. 3 Adorned Rule System for a given DATALOG program and query: For each rule and each distinct adornment of its head predicate, generate a new adorned rule a) b for distinguished (bound) arguments, b) f for other (free) arguments, c) mark body predicates 4 of each rule with corresponding adornments. 2 head predicate: the predicate on the left-hand side of a rule 3 base predicate: a predicate that does not occur on the left-hand side of any rule 4 body predicate: the predicates on the right-hand side of a rule 357

Example For the query-rule and the recursive sg-rule above, we obtain: sg bf (X, Y ) : par(xp, X), sg f b (Y P, XP ), par(y P, Y ). query f (X) : sg bf (john, X). If we apply the rewriting to all rules with all adornments obtained, we end up with: sg bf (X, Y ) : par(xp, X), sg f b (Y P, XP ), par(y P, Y ). sg f b (X, Y ) : par(xp, X), sg bf (Y P, XP ), par(y P, Y ). sg bf (X, X). sg f b (X, X). query f (X) : sg bf (john, X). In general, the DATALOG program grows exponentially! But, since we distinguish predicates with different adornments, only those really needed are used when evaluating a query. 358

Magic Set rewriting continued... After generating the adorned rule system: For each occurence of a derived predicate 5 generate a magic rule. in the body of an adorned rule: For each adorned rule: generate a modified rule. 5 derived predicate: a predicate that occurs on the left-hand side of some rule 359

Generating Magic Rules 1 select an adorned predicate P from the body of the rule; 2 delete all other derived predicates from the body of the rule; 3 rename P a to magic P a (a is the adornment of P ) and delete all free variables from its parameter list; 4 delete all unbound base predicates from the body; 5 delete all free variables in the head predicate P0 a to magic P0 a ; 6 exchange magic P a and magic P a 0. and rename the head predicate Example: sg bf (X, Y ) : par(xp, X), sg f b (Y P, XP ), par(y P, Y ). magic sg f b (XP ) : par(xp, X), magic sg bf (X). 360

Generating Modified Rules For each rule with head predicate P a : add a magic predicate magic P a (Z) to its body, where Z is the list of bound variables in P a. Example: sg bf (X, Y ) : par(xp, X), sg f b (Y P, XP ), par(y P, Y ). sg bf (X, Y ) : magic sg bf (X), par(xp, X), sg f b (Y P, XP ), par(y P, Y ). Notice how magic sg bf (X) works as a filter to restrict the attention to relevant X-values and how bindings are passed between the body predicates. This has been called sideways information passing in the literature. 361

Properties of the Magic Set approach Rather complicated rule rewriting. In general, much larger rule set. Magic Predicates act as filters, such that only relevant data is used in recursion/iteration. Bottom-Up evaluation (e.g., by semi-naive iteration) is typically much faster after this transformation: In parallel to query (or sg) predicate, the magic-predicates are evaluated in each step. Magic predicates are used in selections of the next iterative step. 362

11.3.2 The Counting method Idea: a variation of the Magic Set approach. Magic predicates collect data relevant for the next iterative step. Counting predicates collect relevant data and their distance to the starting point of the iteration. Example: Generalized Same Generation problem p(x, Y ) : f lat(x, Y ). p(x, Y ) : up(x, XU), p(y U, XU), down(y U, Y ). query(x) : p(a, X). Magic predicate magic up would collect all up s of a. Counting predicate additionally counts levels above a. 363

Counting transformation for the example Rules akin to those for the Magic Set approach lead to: counting(a, 0). counting(x, I) : counting(y, J), up(y, X), I = J + 1. p (Y, I) : counting(x, I), f lat(x, Y ). p (Y, I) : p (Y U, J), down(y U, Y ), I = J 1, J > 0. query(x) : p (X, 0).... each iterative step computes only the next level of relevant data. N.B. collecting X-values for predicate p is not necessary (cf. stack)! counting(a, 0). counting(x, I) : counting(y, J), up(y, X), I = J + 1. p (X, Y, I) : counting(x, I), f lat(x, Y ). p (X, Y, I) : counting(x, I), up(x, XU), p (Y U, XU, J), down(y U, Y ), I = J 1. query(x) : p (a, X, 0).... for this to work, data must not contain cycles! 364

11.3.3 Concluding remarks Quite a few other variants have been proposed in the literature (e.g., Reverse Counting, Magic Counting,... ). Extensions can be added to deal with general Horn clauses (including function symbols), e.g., Generalized Magic Sets. Also, more efficient Top-Down evaluation strategies have been developed (e.g., Query/Subquery). Intelligent Top-Down evaluation and clever transformation plus set-oriented Bottom Up evaluation can achieve similar performance. 365

Bibliography Bancilhon, F. (1985). Naive evaluation of recursively defined relations. In On Knowledge Base Management Systems, pages 165 178. Spring-Verlag. Bancilhon, F., Maier, D., Sagiv, Y., and Ullman, J. D. (1986). Magic sets and other strange ways to implement logic programs. In Proc. ACM SIGACT/SIGMOD Symp. on Principles of Database Systems, pages 1 15. Bancilhon, F. and Ramakrishnan, R. (1986). An amateur s introduction to recursive query processing strategies. In Proc. ACM SIGMOD Conf. on Management of Data, pages 16 52, Washington, DC. Beeri, C. and Ramakrishnan, R. (1987). On the power of magic. In Proc. ACM SIGACT/SIGMOD Symp. on Principles of Database Systems, pages 269 283. Ceri, S., Gottlob, G., and Tanca, L. (1989). What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng., 1(1):146 166. Güntzer, U., Kiessling, W., and Bayer, R. (1987). On the evaluation of recursion in (deductive) database systems by efficient differential fixpoint iteration. In Proc. IEEE Int l Conf. on Data Engineering. Henschen, L. J. and Naqvi, S. A. (1984). On compiling queries in recursive first-order databases. Journal of the ACM, 31(1):47 85. Saccà, D. and Zaniolo, C. (1987). Magic counting methods. In Proc. ACM SIGMOD Conference on Management of Data, pages 49 59. 366

Ullman, J. D. (1985). Implementation of logical query languages for databases. ACM Transactions on Database Systems, 10(3):289 321. 367