Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Datalog Evaluation Linh Anh Nguyen Institute of Informatics University of Warsaw

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [2/64] Linh Anh Nguyen Datalog Evaluation

Simple Evaluation Algorithms Methods to evaluate Datalog program P on database instance I, derived from the different equivalent definitions of the semantics: Model-theoretic definition: Enumerate all subsets J B(P, I) and check modelhood pick smallest such J. Fixpoint definition: Augment I using operator T P until a fixpoint is reached. Proof-theoretic definition: Use SLD-resolution (bottom-up or top-down) [3/64] Linh Anh Nguyen Datalog Evaluation

Classes of Datalog Evaluation Two major classes of evaluation approaches: Bottom-Up, Forward Chaining: Proceed in the proof tree from the leaves to the root Apply the datalog rules from body to head (forward) Top-Down, Backward Chaining: Proceed in the proof tree from the root to the leaves Apply the datalog rules from head to body (backward) [4/64] Linh Anh Nguyen Datalog Evaluation

Naive Evaluation Follow the bottom-up approach Compute the minimum fixpoint of T P containing I (= T ω P (I)) Given datalog program P, database instance I 1 Start by assuming all idb relations are empty. 2 Repeatedly evaluate the rules using the edb and the previous idb, to get a new idb. 3 End when no change to idb. Disadvantages Relations have to be computed always from scratch, Relations must be copied, Iteration on all relations, even if they do not recursively depend on each other. [5/64] Linh Anh Nguyen Datalog Evaluation

Semi-naive Evaluation Since the edb never changes, on each round we only get new idb tuples if we use at least one idb tuple that was obtained on the previous round. Saves work, lets us avoid rediscovering most known facts. [6/64] Linh Anh Nguyen Datalog Evaluation

Adornment An adornment for an m-ary predicate p is a string α of length m made up of b (bound) and f (free), let p α be the predicate p adorned by α. The general algorithm for adorning a rule (i) All occurrences of each bound variable in the rule head are bound; (ii) All occurrences of constants are bound; (iii) If a variable X occurs in the rule body, then all occurrences of X in subsequent literals are bound; (iv) The remaining occurrences of variables are free. A different ordering of the rule body would yield different adornments. [7/64] Linh Anh Nguyen Datalog Evaluation

Adornment - Example Consider the following positive logic program P: r 1 : ancestor(x, y) parent(x, y) r 2 : ancestor(x, y) parent(x, z), ancestor(z, y) where x, y, z are variables parent is an extensional predicate, ancestor is an intensional predicate, parent(x, y) means x is a parent of y, ancestor(x, y) means x is an ancestor of y. Let the query be ancestor(john, y)?, asking John is an ancestor of whom?. The task is to find all the descendants of John. [8/64] Linh Anh Nguyen Datalog Evaluation

Adornment - Example Consider the following positive logic program P: r 1 : ancestor(x, y) parent(x, y) r 2 : ancestor(x, y) parent(x, z), ancestor(z, y) The adorned version (denoted by P ad ) of the program P and the query ancestor(john, y)?: r 1 : ancestor bf (x, y) parent(x, y) r 2 : ancestor bf (x, y) parent(x, z), ancestor bf (z, y) query f (y) ancestor bf (John, y) [9/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [10/64] Linh Anh Nguyen Datalog Evaluation

Query-Sub-Query Recursive (QSQR) Top-down, direct evaluation; Avoid the calculation of tuples that are not used for deriving answer; Begin with constants in a query pushing them from goals to subgoals; Use sideways information passing to pass constant binding information from one atom to the next in subgoals. [11/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example Reconsider the following adorned program P ad : r 1 : ancestor bf (x, y) parent(x, y) r 2 : ancestor bf (x, y) parent(x, z), ancestor bf (z, y) query f (y) ancestor bf (John, y) [12/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [13/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [14/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [15/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [16/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [17/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [18/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [19/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [20/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [21/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [22/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [23/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [24/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [25/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [26/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [27/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [28/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [29/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [30/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [31/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [32/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [33/64] Linh Anh Nguyen Datalog Evaluation

QSQR - Example [34/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [35/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique The Magic-Set technique is a rule-rewriting method that generates from a given set of rules a new set of rules, which is equivalent to the original set w.r.t. the original query. After rewriting, the new program can be evaluated by a simple bottom-up algorithm, usually the (improved) semi-naive evaluation method. This method takes advantages of reducing irrelevant facts and restricting the search space. It combines the pros of top-down and bottom-up methods. The Generalized Supplementary Magic Sets algorithm uses some special predicates called supplementary magic predicates in order to eliminate the duplicate work during the processing. [36/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example The Magic-Set rule-rewriting corresponding to the adorned program P ad (denoted by P mg ): magic ancestor bf (z) magic ancestor bf (x), parent(x, z) ancestor bf (x, y) magic ancestor bf (x), parent(x, y) ancestor bf (x, y) magic ancestor bf (x), parent(x, z), ancestor bf (z, y) magic ancestor bf (John). [37/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Applying improved semi-naive evaluation method for P mg : iteration 1: magic ancestor bf (Ruth), magic ancestor bf (Lois) added. iteration 2: magic ancestor bf (Andy), magic ancestor bf (Mark) added. iteration 3: ancestor bf (John, Lois), ancestor bf (John, Ruth), ancestor bf (Lois, Andy), ancestor bf (Lois, Mark), ancestor bf (John, Andy), ancestor bf (John, Mark) added. iteration 4: fixpoint (no more tuples were added). [38/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Generalized Supplementary Magic Sets (denoted by P gmg ): sup magic2 2(x, z) magic ancestor bf (x), parent(x, z) ancestor bf (x, y) magic ancestor bf (x), parent(x, y) ancestor bf (x, y) magic ancestor bf (z) sup magic2 2(x, z), ancestor bf (z, y) sup magic2 2 (x, z) magic ancestor bf (John). [39/64] Linh Anh Nguyen Datalog Evaluation

Magic-Set technique - Example Applying improved semi-naive evaluation method for P gmg : iteration 1: sup magic2 2(John, Ruth), sup magic2 2 (John, Lois), magic ancestor bf (Ruth), magic ancestor bf (Lois) added. iteration 2: sup magic2 2(Lois, Andy), sup magic2 2 (Lois, Mark), magic ancestor bf (Andy), magic ancestor bf (Mark) added. iteration 3: ancestor bf (John, Lois), ancestor bf (John, Ruth), ancestor bf (Lois, Andy), ancestor bf (Lois, Mark), ancestor bf (John, Andy), ancestor bf (John, Mark) added. iteration 4: fixpoint (no more tuples were added). [40/64] Linh Anh Nguyen Datalog Evaluation

Outline Simple Evaluation Methods Query-Subquery Recursive Magic-Set Technique Query-Subquery Nets [41/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Make a QSQ-net structure from a program P and use it as a flow control network to choose the processing order of transferring data, in an efficient way. The intention is to increase efficiency of query processing by: eliminating redundant computation, increasing flexibility, reducing the number of accesses to the secondary storage. The framework forms a generic evaluation method called QSQN. It has the following nice properties: the approach is goal-directed, each subquery is processed only once, each supplement tuple, if desired, is transferred only once, operations are done set-at-a-time, any control strategy can be used. [42/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Definition 1. QSQ-net Structure A QSQ-net structure of a positive logic program P is a tuple (V, E, T ) such that: V is a set of nodes, E is a set of edges, T is a function, called the memorizing type of the net structure. We call the pair (V, E) the QSQ-net topological structure of P. Definition 2. QSQ-net A QSQ-net of P is a tuple N = (V, E, T, C) such that: (V, E, T ) is a QSQ-net structure of P, C is a mapping that associates each node v V with a structure call the content of v. [43/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets A subquery is a pair of the form (t, δ), where t is a generalized tuple and δ is an idempotent substitution such that dom(δ) Vars(t) =. Subqueries are transferred through edges and processed at nodes. Formally, the processing of a subquery has following properties: every subquery / input tuple / answer tuple subsumed by another one is ignored; every subquery / input tuple / answer tuple with term-depth greater than a fixed bound L is ignored; the processing is divided into smaller steps which can be delayed to maximize flexibility and allow various control strategies; the processing is done set-at-a-time (e.g., for all the unprocessed subqueries accumulated in a given node). [44/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example Reconsider the program P: ancestor(x, y) parent(x, y) ancestor(x, y) parent(x, z), ancestor(z, y). The QSQ-net topological structure of P is constructed as follows: [45/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [46/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [47/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [48/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [49/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [50/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [51/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [52/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [53/64] Linh Anh Nguyen Datalog Evaluation

Query-Subquery Nets Control strategies are used: Disk Access Reduction (DAR), which tries to reduce the number of accesses to the secondary storage; Depth-First Search (DFS), which gives priority to the order of clauses in the positive logic program defining intensional predicates and thus allows the user to control the evaluation to a certain extent; Improved Depth-First Control Strategy (IDFS), which is an improved version of DFS and the aim is to accumulate as many as possible tuples or subqueries at each node of the QSQ-net before processing it. [54/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example Reconsider the program P: ancestor(x, y) parent(x, y) ancestor(x, y) parent(x, z), ancestor(z, y). The query: ancestor(john, y)? [55/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [56/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [57/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [58/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [59/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [60/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [61/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [62/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example [63/64] Linh Anh Nguyen Datalog Evaluation

QSQN - Example At this point, some edges are active without affecting to the ans ancestor relation. When all the attributes unprocessed, unprocessed subqueries, unprocessed subqueries 2 and unprocessed tuples of the nodes in the net are empty sets, the algorithm terminates and returns the set tuples(ans ancestor) = {(John, Lois), (John, Ruth), (John, Mark), (John, Andy)}. for the query ancestor(john, y)? [64/64] Linh Anh Nguyen Datalog Evaluation