a standard database system

Similar documents
a standard database system

Foundations of Relational Query Languages. Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18

On the Parallel Complexity of Structural CSP Decomposition Methods

A Comparison of Structural CSP Decomposition Methods

From Hypertree Width to Submodular Width and Data-dependent Structural Decompositions

Graph Databases. Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18

The Complexity of Relational Queries: A Personal Perspective

PART 1 GRAPHICAL STRUCTURE

Lecture 22 Acyclic Joins and Worst Case Join Results Instructor: Sudeepa Roy

On the Hardness of Counting the Solutions of SPARQL Queries

Structural characterizations of schema mapping languages

Semantic Acyclicity on Graph Databases

the rest of this course

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Reducing redundancy in the hypertree decomposition scheme

Join Algorithms. Qiu, Yuan and Zhang, Haoqian

Necessary edges in k-chordalizations of graphs

Composing Schema Mapping

D2.P2: A Query Planner Based on Quantitative and Structural Information. Francesco Scarcello Alfredo Mazzitelli Università della Calabria

Construction of a GAI Tree with Hypergraph Decompositions

Finite Model Theory and Its Applications

An Extension of Complexity Bounds and Dynamic Heuristics for Tree-Decompositions of CSP

Datalog and its Extensions for Semantic Web Databases

On the Complexity of Enumerating the Answers to Well-designed Pattern Trees

Approximation Schemes for Planar Graph Problems (1983, 1994; Baker)

ANDREAS PIERIS JOURNAL PAPERS

Declarative programming. Logic programming is a declarative style of programming.

Introduction to Parameterized Complexity

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Constraint Solving by Composition

Database Constraints and Homomorphism Dualities

Pure Nash Equilibria: Complete Characterization of Hard and Easy Graphical Games

The Resolution Width Problem is EXPTIME-Complete

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

22/06/2010. Outline of the Tutorial. (NP-hard) Problems. Identification of Easy Classes. Beyond Tree Decompositions

Advanced Query Processing

The Dominating Set Problem in Intersection Graphs

SEMANTIC ACYCLICITY ON GRAPH DATABASES

9. Parameter Treewidth

Principles of Guarded Structural Indexing

A Retrospective on Datalog 1.0

W[1]-hardness. Dániel Marx. Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Rewriting Ontology-Mediated Queries. Carsten Lutz University of Bremen

W[1]-hardness. Dániel Marx 1. Hungarian Academy of Sciences (MTA SZTAKI) Budapest, Hungary

Distance d-domination Games

Vertex Cover is Fixed-Parameter Tractable

PLANAR GRAPH BIPARTIZATION IN LINEAR TIME

Logic in Algorithmic Graph Structure Theory

arxiv:cs/ v1 [cs.db] 5 Apr 2002

Parameterized Reductions

A Game-Theoretic Approach to Constraint Satisfaction

Exact Algorithms and Applications for Tree-Like Weighted Set Cover 1

Foundations of SPARQL Query Optimization

Foundations and Applications of Schema Mappings

Common Induced Subgraph Isomorphism Structural Parameterizations and Exact Algorithms

Schema Mappings and Data Exchange

Exact Algorithms for Graph Homomorphisms

Master Thesis Presentation

REU Problems of the Czech Group

Summary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition.

Querying Graph Databases 1/ 83

Important separators and parameterized algorithms

Algorithmic graph structure theory

Size and Treewidth Bounds for Conjunctive Queries

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Treewidth. Kai Wang, Zheng Lu, and John Hicks. April 1 st, 2015

NP-Completeness. Algorithms

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

Publications of Pablo Barceló by type

10. EXTENDING TRACTABILITY

Monadic Datalog Containment on Trees

Schema Design for Uncertain Databases

Chordal deletion is fixed-parameter tractable

How Much Of A Hypertree Can Be Captured By Windmills?

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Exponential time algorithms for the minimum dominating set problem on some graph classes

The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates

On the Complexity of Query Answering over Incomplete XML Documents

Regular Expressions for Data Words

Isomorphism of Graphs Which are k-separable*

Foundations of Artificial Intelligence

28.1 Decomposition Methods

Dynamic heuristics for branch and bound search on tree-decomposition of Weighted CSPs

1 Inference for Boolean theories

ATFD: material for presentations, projects, and essays

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons

Multicut is FPT. Lyon Cedex 07, France

FEDOR V. FOMIN. Lectures on treewidth. The Parameterized Complexity Summer School 1-3 September 2017 Vienna, Austria

FIRST-ORDER QUERY EVALUATION ON STRUCTURES OF BOUNDED DEGREE

Lecture 1: Conjunctive Queries

Computing Full Disjunctions

Perfect Matching Conjectures and Their Relationship to P NP

Bidimensionality Theory and Algorithmic Graph Minor Theory

Complexity Results on Graphs with Few Cliques

Graph Crossing Number and Isomorphism SPUR Final Paper, Summer 2012

Foundations of Databases

Parameterized coloring problems on chordal graphs

CS364A: Algorithmic Game Theory Lecture #19: Pure Nash Equilibria and PLS-Completeness

Transcription:

user queries (RA, SQL, etc.) relational database Flight origin destination airline Airport code city VIE LHR BA VIE Vienna LHR EDI BA LHR London LGW GLA U2 LGW London LCA VIE OS LCA Larnaca a standard database system

user queries (RA, SQL, etc.) Flight origin destination airline Airport code city VIE LHR BA VIE Vienna LHR EDI BA LHR London LGW GLA U2 LGW London LCA VIE OS LCA Larnaca but, we live in the era of big data

Volume size does mattes (thousands of TBs of data) Veracity data is often incomplete/inconsistent Variety many data formats (structured, semi-structured, etc.) Velocity data often arrives at fast speed (updates are frequent)

the rest of this course Volume size does mattes (thousands of TBs of data) Veracity data is often incomplete/inconsistent Variety many data formats (structured, semi-structured, etc.) Velocity data often arrives at fast speed (updates are frequent)

Approximation of Conjunctive Queries Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18

A Plausible Approach to address the challenges raised by the volume of big data replace the query with one that is much faster to execute!!!

Minimizing Conjunctive Queries Database theory has developed principled methods for optimizing CQs: Find an equivalent CQ with minimal number of atoms (the core) Provides a notion of true optimality Q(x) :- R(x,y), R(x,b), R(a,b), R(u,c), R(u,v), S(a,c,d) {y! b} Q(x) :- R(x,y), R(x,b), R(a,b), R(u,c), R(u,v), S(a,c,d) {v! c} Q(x) :- R(x,y), R(x,b), R(a,b), R(u,c), R(u,v), S(a,c,d) minimal query

Minimizing Conjunctive Queries But, a minimal equivalent CQ might not be easier to evaluate query evaluation remains NP-hard However, we know good classes of CQs for which query evaluation is tractable (in combined complexity): Graph-based Hypergraph-based

(Hyper)graph of Conjunctive Queries Q :- R(x,y,z), R(z,u,v), R(v,w,x) graph of Q - G(Q) hypergraph of Q - H(Q) x x y z y z u v u v w w

Good Classes of Conjunctive Queries Graph-based measures how close a graph is to a tree CQs of bounded treewidth their graph has bounded treewidth measures how close a hypergraph is to an acyclic one Hypergraph-based: CQs of bounded hypertree width their hypergraph has bounded hypertree width Acyclic CQs their hypegraph has hypertree width 1

Treewidth of a Graph A tree decomposition of a graph G = (V,E) is a labeled tree T = (N,F,λ), where λ : N! 2 V such that: 1. For each node u 2 V of G, there exists n 2 N such that u 2 λ(n) 2. For each edge (u,v) 2 E, there exists n 2 N such that {u,v} µ λ(n) 3. For each node u 2 V of G, the set {n 2 N u 2 λ(n)} induces a connected subtree of T 1 6 7 3 4 8 5 {4,6} {4,5} {3,4,6} {4,6,8} {2,5} {1,3,6} {6,7,8} 2

Treewidth of a Graph A tree decomposition of a graph G = (V,E) is a labeled tree T = (N,F,λ), where λ : N! 2 V such that: 1. For each node u 2 V of G, there exists n 2 N such that u 2 λ(n) 2. For each edge (u,v) 2 E, there exists n 2 N such that {u,v} µ λ(n) 3. For each node u 2 V of G, the set {n 2 N u 2 λ(n)} induces a connected subtree of T 1 6 7 3 4 8 5 {4,6} {4,5} {3,4,6} {4,6,8} {2,5} {1,3,6} {6,7,8} 2

Treewidth of a Graph A tree decomposition of a graph G = (V,E) is a labeled tree T = (N,F,λ), where λ : N! 2 V such that: 1. For each node u 2 V of G, there exists n 2 N such that u 2 λ(n) 2. For each edge (u,v) 2 E, there exists n 2 N such that {u,v} µ λ(n) 3. For each node u 2 V of G, the set {n 2 N u 2 λ(n)} induces a connected subtree of T 1 6 7 3 4 8 5 {4,6} {4,5} {3,4,6} {4,6,8} {2,5} {1,3,6} {6,7,8} 2

Treewidth of a Graph A tree decomposition of a graph G = (V,E) is a labeled tree T = (N,F,λ), where λ : N! 2 V such that: 1. For each node u 2 V of G, there exists n 2 N such that u 2 λ(n) 2. For each edge (u,v) 2 E, there exists n 2 N such that {u,v} µ λ(n) 3. For each node u 2 V of G, the set {n 2 N u 2 λ(n)} induces a connected subtree of T The width of a tree decomposition T = (N,F,λ) is max n 2 N { λ(n) - 1} -1 so that the treewidth of a tree is 1 The treewidth of G is the minimum width over all tree decompositions of G

CQs of Bounded Treewidth Theorem: For a fixed k 0, BQE(CQTW k ) is in PTIME {Q 2 CQ the treewidth of G(Q) is at most k} Actually, if G(Q) has treewidth k 0, then Q can be evaluated in time O( D k ) + time to compute a tree decomposition for G(Q) of optimal width, which is feasible in linear time

Good Classes of Conjunctive Queries Graph-based CQs of bounded treewidth their graph has bounded treewidth Evaluation is feasible in polynomial time Hypergraph-based: CQs of bounded hypertree width their hypergraph has bounded hypertree width Acyclic CQs their hypegraph has hypertree width 1

Acyclic Hypergraphs A join tree of a hypergraph H = (V,E) is a labeled tree T = (N,F,λ), where λ : N! E such that: 1. For each hyperedge e 2 E of H, there exists n 2 N such that e = λ(n) 2. For each node u 2 V of H, the set {n 2 N u 2 λ(n)} induces a connected subtree of T 1 2 3 8 4 7 5 6 {8,9} {1,3,4,5,6,7,8} {9,10,11} 13 9 {1,2,3} {11,12} 12 11 10 {12,13}

Acyclic Hypergraphs A join tree of a hypergraph H = (V,E) is a labeled tree T = (N,F,λ), where λ : N! E such that: 1. For each hyperedge e 2 E of H, there exists n 2 N such that e = λ(n) 2. For each node u 2 V of H, the set {n 2 N u 2 λ(n)} induces a connected subtree of T Definition: A hypergraph is acyclic if it has a join tree 2 1 3 prime example of a cyclic hypergraph

Acyclic Hypergraphs A join tree of a hypergraph H = (V,E) is a labeled tree T = (N,F,λ), where λ : N! E such that: 1. For each hyperedge e 2 E of H, there exists n 2 N such that e = λ(n) 2. For each node u 2 V of H, the set {n 2 N u 2 λ(n)} induces a connected subtree of T Definition: A hypergraph is acyclic if it has a join tree 2 1 3 but this is acyclic

Acyclic CQs Theorem: BQE(ACQ) is in PTIME {Q 2 CQ H(Q) is acyclic} Actually, if H(Q) is acyclic, then Q can be evaluated in time O( D Q ), i.e., linear time in the size of D and Q

Good Classes of Conjunctive Queries: Recap Graph-based CQs of bounded treewidth their graph has bounded treewidth Evaluation is feasible in polynomial time Hypergraph-based: CQs of bounded hypertree width their hypergraph has bounded hypertree width Evaluation is feasible in polynomial time Acyclic CQs their hypegraph has hypertree width 1 Evaluation is feasible in linear time CQHTW 1 = ACQ CQHTW k CQTW 1 CQTW k ACQ

Back to Our Goal Replace a given CQ with one that is much faster to execute or Replace a given CQ with one that falls in good class of CQs preferably, with an acyclic CQ since evaluation is in linear time

Semantic Acyclicity Definition: A CQ Q is semantically acyclic if there exists an acyclic CQ Q such that Q Q Q(x,z) :- R(x,y), R(y,z), R(x,w), R(w,z) w x y z {w! y, z! y} Q(x,z) :- R(x,y), R(y,z) x y z

Semantic Acyclicity Theorem: A CQ Q is semantically acyclic iff its core is acyclic Theorem: Deciding whether a CQ Q is semantically acyclic is NP-complete Proof idea (upper bound): We can show the following: if Q is semantically acyclic, then there exists an acyclic CQ Q such that Q Q and Q Q Then, we can guess in polynomial time: An acyclic CQ Q such that Q Q A mapping h 1 : terms(q)! terms(q ) A mapping h 2 : terms(q )! terms(q) And verify in polynomial time that h 1 is a query homomorphism from Q to Q (i.e., Q µ Q), and h 2 is a query homomorphism from Q to Q (i.e., Q µ Q )

Semantic Acyclicity Theorem: A CQ Q is semantically acyclic iff its core is acyclic Theorem: Deciding whether a CQ Q is semantically acyclic is NP-complete But, semantic acyclicity is rather weak: Not many CQs are semantically acyclic ) consider acyclic approximations of CQs Semantic acyclicity is not an improvement over usual optimization both approaches are based on the core ) exploit semantic information in the form of constraints

Acyclic Approximations of CQs

Acyclic Approximations If our CQ Q is not semantically acyclic, we may target a CQ that is: 1. Easy to evaluate acyclic 2. Provides sound answers contained in Q 3. As informative as possible maximally contained in Q Definition: A CQ Q is an acyclic approximation of Q if: 1. Q is acyclic 2. Q µ Q 3. There is no acyclic CQ Q such that Q ½ Q µ Q

Do Acyclic Approximations Exist? The cyclic CQ Q :- R(x,y,z), R(z,u,v), R(v,w,x) has several acyclic approximations Q 1 :- R(x,y,z), R(z,u,y), R(y,v,x) Q 2 :- R(x,y,z), R(z,u,v), R(v,w,x), R(x,z,v) Q 3 :- R(x,y,x)

Existence, Size and Computation Theorem: Consider a CQ Q. Then: 1. Q has an acyclic approximation 2. Each acyclic approximation of Q has size polynomial in Q 3. An acyclic approximation of Q can be found in time 2O( Q log Q ) 4. Q has at most exponentially many (non-equivalent) acyclic approximations

Evaluating Acyclic Approximations Recall that evaluating Q over D takes time D O( Q ) Evaluating an acyclic approximation Q of Q over D takes time 2 O( Q log Q ) + D Q k time for computing Q time for evaluating Q - Q Q k - Evaluation of an acyclic CQ Q A is feasible in time O( D Q A ) Observe that 2 O( Q log Q ) + D Q k is dominated by D 2O( Q log Q ) ) fixed-parameter tractable

Poor Approximations Q :- E(x,y), E(y,z), E(z,x) y x z has only one acyclic approximation, that is, Q :- E(x,x) Proposition: Consider a Boolean CQ Q that contains a single binary relation E(.,.). If G(Q) is not bipartite, then the only acyclic approximation of Q is Q :- E(x,x)

Acyclic Approximations: Recap Acyclic approximations are useful when the CQ is not semantically acyclic Always exist, but are not unique Have polynomial size, and can be computed in exponential time Can be evaluated efficiently (fixed-parameter tractability) In some cases, acyclic approximations are not very informative

Back to Semantic Acyclicity But, semantic acyclicity is rather weak: Not many CQs are semantically acyclic ) consider acyclic approximations of CQs Semantic acyclicity is not an improvement over usual optimization both approaches are based on the core ) exploit semantic information in the form of constraints

Associated Papers Pablo Barceló, Leonid Libkin, Miguel Romero: Efficient Approximations of Conjunctive Queries. SIAM J. Comput. 43(3): 1085-1130 (2014) Eligible topics include static analysis of approximations Pablo Barceló, Miguel Romero, Moshe Y. Vardi: Semantic Acyclicity on Graph Databases. SIAM J. Comput. 45(4): 1339-1376 (2016) Semantic acyclicity for CQs Hubie Chen, Víctor Dalmau: Beyond Hypertree Width: Decomposition Methods Without Decompositions. CP 2005: 167-181 Complexity of semantic acyclicity for CQs (in a different context) Víctor Dalmau, Phokion G. Kolaitis, Moshe Y. Vardi: Constraint Satisfaction, Bounded Treewidth, and Finite-Variable Logics. CP 2002: 310-326 Evaluation of semantically acyclic CQ (in a different context)

Associated Papers Joerg Flum, Martin Grohe: Fixed-Parameter Tractability, Definability, and Model- Checking. SIAM J. Comput. 31(1): 113-145 (2001) A different way of measuring complexity, and its full analysis Joerg Flum, Markus Frick, Martin Grohe: Query evaluation via tree- decompositions. Journal of the ACM 49(6): 716-752 (2002) Using tree decompositions to get faster query evaluation Markus Frick, Martin Grohe: Deciding first-order properties of locally treedecomposable structures. Journal of the ACM 48(6): 1184-1206 (2001) How to improve performance of relational queries on databases with special properties

Associated Papers Georg Gottlob, Nicola Leone, Francesco Scarcello: The complexity of acyclic conjunctive queries. Journal of the ACM 48(3):431-498 (2001) An in-depth study of acyclicity Georg Gottlob, Nicola Leone, Francesco Scarcello: Hypertree Decompositions and Tractable Queries. J. Comput. Syst. Sci. 64(3):579-627 (2002) A hierarchy of classes of efficient CQs, the bottom level of which is acyclic queries Martin Grohe, Thomas Schwentick, Luc Segoufin: When is the evaluation of conjunctive queries tractable? STOC 2001: 657-666 Characterizing efficiency of CQs via the notion of bounded treewidth Mihalis Yannakakis: Algorithms for Acyclic Database Schemes. VLDB 1981: 82-94 Notion of acyclicity of CQs and fast evaluation scheme based on it