DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

Similar documents
DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Lecture 1: Conjunctive Queries

OWL 2 Profiles. An Introduction to Lightweight Ontology Languages. Markus Krötzsch University of Oxford. Reasoning Web 2012

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Towards a Logical Reconstruction of Relational Database Theory

The Complexity of Data Exchange

Schema Mappings and Data Exchange

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

COMP4418 Knowledge Representation and Reasoning

Relative Information Completeness

Semantic Errors in Database Queries

Checking Containment of Schema Mappings (Preliminary Report)

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Data integration lecture 2

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Department of Computer Science CS-RR-15-01

Uncertainty in Databases. Lecture 2: Essential Database Foundations

On Reconciling Data Exchange, Data Integration, and Peer Data Management

RECENT ADVANCES IN REASONING WITH EXISTENTIAL RULES

KNOWLEDGE GRAPHS. Lecture 4: Introduction to SPARQL. TU Dresden, 6th Nov Markus Krötzsch Knowledge-Based Systems

Uncertain Data Models

Resolution (14A) Young W. Lim 6/14/14

Detecting Logical Errors in SQL Queries

Constraint Solving. Systems and Internet Infrastructure Security

Review Material: First Order Logic (FOL)

Access Patterns and Integrity Constraints Revisited

Foundations of Schema Mapping Management

Structural Characterizations of Schema-Mapping Languages

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

Logic and its Applications

XXXII Conference on Very Large Data Bases VLDB 2006 Seoul, Korea, 15 th September 2006

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

When Can We Answer Queries Using Result-Bounded Data Interfaces?

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Foundations of Databases

Range Restriction for General Formulas

LTCS Report. Concept Descriptions with Set Constraints and Cardinality Constraints. Franz Baader. LTCS-Report 17-02

Propositional Logic Formal Syntax and Semantics. Computability and Logic

Logical reconstruction of RDF and ontology languages

CSC Discrete Math I, Spring Sets

Structural characterizations of schema mapping languages

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

Database Systems. Basics of the Relational Data Model

An Annotated Language

A Generating Plans from Proofs

CMPS 277 Principles of Database Systems. Lecture #4

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

15-819M: Data, Code, Decisions

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

HOL DEFINING HIGHER ORDER LOGIC LAST TIME ON HOL CONTENT. Slide 3. Slide 1. Slide 4. Slide 2 WHAT IS HIGHER ORDER LOGIC? 2 LAST TIME ON HOL 1

Finite Model Generation for Isabelle/HOL Using a SAT Solver

Minimal-Change Integrity Maintenance Using Tuple Deletions

Propositional Logic. Part I

A Retrospective on Datalog 1.0

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Higher-Order Logic. Specification and Verification with Higher-Order Logic

Forward Assignment; Strongest Postconditions

Th(N, +) is decidable

Provable data privacy

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Section 3 Default Logic. Subsection 3.1 Introducing defaults and default logics

Process-Centric Views of Data-Driven Business Artifacts

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

Deriving Optimized Integrity Monitoring Triggers from Dynamic Integrity Constraints

Bibliographic citation

Data Exchange: Semantics and Query Answering

On the Data Complexity of Consistent Query Answering over Graph Databases

Automated Reasoning. Natural Deduction in First-Order Logic

Deductive Methods, Bounded Model Checking

SWEN-220 Mathematical Models of Software

Typed Lambda Calculus

Discrete Mathematics Lecture 4. Harper Langston New York University

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Scalable Data Exchange with Functional Dependencies

Safe Stratified Datalog With Integer Order Does not Have Syntax

Warm-Up Problem. Let L be the language consisting of as constant symbols, as a function symbol and as a predicate symbol. Give an interpretation where

The Inverse of a Schema Mapping

On the Scalability of Description Logic Instance Retrieval

INCONSISTENT DATABASES

Logic Programming and Resolution Lecture notes for INF3170/4171

Semantics and Pragmatics of NLP

System Description: iprover An Instantiation-Based Theorem Prover for First-Order Logic

6c Lecture 3 & 4: April 8 & 10, 2014

Schema Independent Relational Learning

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Logical Foundations of Relational Data Exchange

arxiv:cs/ v1 [cs.db] 5 Apr 2002

Lecture 5 - Axiomatic semantics

Semantics and Pragmatics of NLP Propositional Logic, Predicates and Functions

Integrity Constraints For Access Control Models

Software Engineering Lecture Notes

Formal Predicate Calculus. Michael Meyling

6. Hoare Logic and Weakest Preconditions

Typed Lambda Calculus for Syntacticians

Notes. Notes. Introduction. Notes. Propositional Functions. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry.

Proving the Correctness of Distributed Algorithms using TLA

ESSLLI 2001, Helsinki, Finland The Mathematics of Information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Transcription:

DATABASE THEORY Lecture 18: Dependencies Markus Krötzsch Knowledge-Based Systems TU Dresden, 3rd July 2018

Review: Databases and their schemas Lines: Line Type 85 bus 3 tram F1 ferry...... Stops: SID Stop Accessible 17 Hauptbahnhof true 42 Helmholtzstr. true 57 Stadtgutstr. true 123 Gustav-Freytag-Str. false......... Connect: From To Line 57 42 85 17 789 3......... Every table has a schema: Lines[Line:string, Type:string] Stops[SID:int, Stop:string, Accessible:bool] Connect[From:int, To:int, Line:string] Markus Krötzsch, 3rd July 2018 Database Theory slide 2 of 17

Adding constraints Observation: Even with datatypes, schema information in databases so far is very limited. Example 18.1: In the public transport example, one would assume that, e.g., SID is a key in the Stops table, i.e., no two rows refer to the same stop ID, every Line mentioned in the Connect table also occurs as a Line in the Lines table, and many more. Markus Krötzsch, 3rd July 2018 Database Theory slide 3 of 17

Adding constraints Observation: Even with datatypes, schema information in databases so far is very limited. Example 18.1: In the public transport example, one would assume that, e.g., SID is a key in the Stops table, i.e., no two rows refer to the same stop ID, every Line mentioned in the Connect table also occurs as a Line in the Lines table, and many more. Can we express such schema-level information? Dependencies Markus Krötzsch, 3rd July 2018 Database Theory slide 3 of 17

Functional dependencies and keys A common type of simple dependencies are so-called functional dependencies Definition 18.2: A functional dependency (fd) is an expression Table : A B, where Table is a relation name, and A and B are sets of attributes of Table. A key is an fd where B is the set of all attributes of Table. A database instance I satisfies an fd as above if all tuples in Table I that agree on the values of all attributes in A also agree on the values of all attributes in B. Example 18.3: The key in the previous example corresponds to the fd Stops : SID SID, Stop, Accessible (one usually omits set braces when writing fds). Markus Krötzsch, 3rd July 2018 Database Theory slide 4 of 17

Inclusion dependencies Inclusion dependencies can establish relationships between two tables: Definition 18.4: An inclusion dependency (ind) is an expression Table1[A] Table2[B], where Table1 and Table2 are relation names, and A and B are lists of attributes of Table1 and Table2, respectively, that are of equal length. A database instance I satisfies an ind as above if, for each tuple τ Table1 I, there is a tuple τ Table2 I such that τ[a i ] = τ[b i ] for all attributes a i A and b i B that occur in corresponding positions in both lists. Example 18.5: The inclusion in the previous example corresponds to the ind Connect[Line] Lines[Line]. Markus Krötzsch, 3rd July 2018 Database Theory slide 5 of 17

Why dependencies? Dependencies have many possible uses: Express constraints that a DBMS must guarantee (updates violating constraints fail) Improve physical data storage and indexing Optimise DB schema, e.g., by normalising tables Example 18.6: Consider a table Customers[Name,Street,ZIP,City] with key Name and another functional dependency Customers : ZIP City, which suggests to normalise the table into two tables CustomersNew[Name,Street,ZIP] and Cities[ZIP,City]. Optimise queries for the special type of databases that satisfy some constraints Take background knowledge into account to compare queries (containment, equivalence) Query answering under constraints: compute answers under the assumption that the database has been repaired to satisfy all constraints Construct views over databases for query answering, especially in data integration scenarios In each case, it can also be helpful to infer additional dependencies Markus Krötzsch, 3rd July 2018 Database Theory slide 6 of 17

Generalisation Observation Both kinds of dependencies have a logical if-then structure we can vastly generalise these dependencies Markus Krötzsch, 3rd July 2018 Database Theory slide 7 of 17

Generalisation Observation Both kinds of dependencies have a logical if-then structure we can vastly generalise these dependencies For the following definition, we consider again the unnamed perspective, which is more common for logical expressions: Definition 18.7: A dependency is a formula of the form x, y.ϕ[ x, y] z.ψ[ x, z], where x, y, z are disjoint lists of variables, and ϕ (the body) and ψ (the head) are conjunctions of atoms using variables x y and x z, respectively. We allow equality atoms s t to occur in dependencies. The variables x, which occur in body and head, are known as frontier. It is common to omit the universal quantifiers when writing dependencies. Semantically, we will consider dependencies as formulae of first-order logic (with equality). Markus Krötzsch, 3rd July 2018 Database Theory slide 7 of 17

Equality-generating dependencies generalise fds An important special type of dependencies are as follows: Definition 18.8: Equality-generating dependencies (egds) are dependencies with heads of the form s t, where s, t are terms, and which do not contain existential qualifiers. Markus Krötzsch, 3rd July 2018 Database Theory slide 8 of 17

Equality-generating dependencies generalise fds An important special type of dependencies are as follows: Definition 18.8: Equality-generating dependencies (egds) are dependencies with heads of the form s t, where s, t are terms, and which do not contain existential qualifiers. We therefore find fds and keys to be special egds: Observation 18.9: Every fd of the form Table : A B can be decomposed into fds of the form Table : A {b} for each b B. Such fds can be written as egds. Example 18.10: Assuming the order of attributes to be as written, the earlier fd Customers : ZIP City can be expressed as Customers(x, y, z, v) Customers(x, y, z, v ) v v. Markus Krötzsch, 3rd July 2018 Database Theory slide 8 of 17

Tuple-generating dependencies generalise inds Another important kind of dependencies does not use equality: Definition 18.11: Tuple-generating dependencies (tgds) are dependencies without equality atoms. Markus Krötzsch, 3rd July 2018 Database Theory slide 9 of 17

Tuple-generating dependencies generalise inds Another important kind of dependencies does not use equality: Definition 18.11: Tuple-generating dependencies (tgds) are dependencies without equality atoms. We therefore find inds to be special tgds: Observation 18.12: Every ind can be written as tgd. Example 18.13: The ind Connect[Line] Lines[Line] can be expressed as Connect(x, y, z) v.lines(z, v). Markus Krötzsch, 3rd July 2018 Database Theory slide 9 of 17

Full dependencies Tgds without existential variable are called full tgds; tgds that are not full are sometimes called embedded (note that this terminology is intuitive when considering inds) Proposition 18.14: Full tgds can be expressed using several full tgds with only a single head. Proof: Just write a full tgd ϕ ψ as a set of tgds {ϕ H H ψ}. Markus Krötzsch, 3rd July 2018 Database Theory slide 10 of 17

Full dependencies Tgds without existential variable are called full tgds; tgds that are not full are sometimes called embedded (note that this terminology is intuitive when considering inds) Proposition 18.14: Full tgds can be expressed using several full tgds with only a single head. Proof: Just write a full tgd ϕ ψ as a set of tgds {ϕ H H ψ}. Example 18.15: Splitting heads into several dependencies is not generally admissible in tgds. For example, the tgd uncle(x, y) z.child(x, z) brother(z, y) entails the set {uncle(x, y) z.child(x, z), uncle(x, y) z.brother(z, y)}, but not vice versa. Remark: In general, heads of tgds can still be decomposed to some extent, as long as each existential quantifier (and all occurrences of the bound variable) remains in one rule (sets of atoms connected by sharing existential variables have been called pieces). Markus Krötzsch, 3rd July 2018 Database Theory slide 10 of 17

Reasoning with dependencies Markus Krötzsch, 3rd July 2018 Database Theory slide 11 of 17

Reasoning tasks (1) Some of the main reasoning tasks for working with dependencies are as follows: Dependency implication: Input: sets of dependencies Σ and {σ} Output: yes if Σ = σ; no otherwise CQ containment under constraints: Input: set of dependencies Σ and CQs q 1 and q 2 Output: yes if every answer to q 1 is also an answer to q 2 in every database I with I = Σ; no otherwise For both reasoning tasks we consider only database instances that satisfy the given constraints. Note: We may consider reasoning problems to refer only to finite databases, or to generalised (possibly infinite) databases. In general, this does not lead to the same results, so it has to be defined. Unless otherwise stated, we always allow for infinite models/databases here. Markus Krötzsch, 3rd July 2018 Database Theory slide 12 of 17

Reasoning tasks (2) BCQ entailment under constraints: Input: set of dependencies Σ, database instance I, and boolean CQ q Output: yes if every extension I I with I = Σ also satisfies I = q; no otherwise Alternatively, we can also view I as a (syntactic) set of ground facts and ask if I Σ = q (a first-order logic entailment problem) As usual, BCQ entailment is the decision-problem version for CQ answering Note: Again, we allow for the extension I to be infinite in general. Markus Krötzsch, 3rd July 2018 Database Theory slide 13 of 17

Reducing reasoning tasks Theorem 18.16: Dependency implication, CQ containment under constraints, and BCQ entailment under constraints are equivalent problems. Proof: Consider a set Σ of dependencies. Markus Krötzsch, 3rd July 2018 Database Theory slide 14 of 17

Reducing reasoning tasks Theorem 18.16: Dependency implication, CQ containment under constraints, and BCQ entailment under constraints are equivalent problems. Proof: Consider a set Σ of dependencies. For CQs q 1 and q 2, containment under constraints corresponds to the implication of a dependency q 1 q 2 by Σ. Markus Krötzsch, 3rd July 2018 Database Theory slide 14 of 17

Reducing reasoning tasks Theorem 18.16: Dependency implication, CQ containment under constraints, and BCQ entailment under constraints are equivalent problems. Proof: Consider a set Σ of dependencies. For CQs q 1 and q 2, containment under constraints corresponds to the implication of a dependency q 1 q 2 by Σ. For a dependency σ = ϕ[ x, y] z.ψ[ x, z], let I ϕ be the database instance corresponding to the CQ ϕ, obtained using a substitution θ that replaces each variable v in x y by a fresh constant c v ; then σ is entailed by Σ iff I ϕ, Σ = z.ψθ. Markus Krötzsch, 3rd July 2018 Database Theory slide 14 of 17

Reducing reasoning tasks Theorem 18.16: Dependency implication, CQ containment under constraints, and BCQ entailment under constraints are equivalent problems. Proof: Consider a set Σ of dependencies. For CQs q 1 and q 2, containment under constraints corresponds to the implication of a dependency q 1 q 2 by Σ. For a dependency σ = ϕ[ x, y] z.ψ[ x, z], let I ϕ be the database instance corresponding to the CQ ϕ, obtained using a substitution θ that replaces each variable v in x y by a fresh constant c v ; then σ is entailed by Σ iff I ϕ, Σ = z.ψθ. A BCQ q is entailed by a database instance I under constraints Σ iff query t() is contained in q under constraints Σ {t()} {p(c 1,..., c n ) c 1,..., c n p I }, where t is a fresh nullary predicate. Note: In the last case, we use rules that contain ground heads but no body to express facts. Markus Krötzsch, 3rd July 2018 Database Theory slide 14 of 17

Reasoning is undecidable The following should not come as a surprise: Theorem 18.17: Query entailment under a set of tgd constraints is undecidable. Markus Krötzsch, 3rd July 2018 Database Theory slide 15 of 17

Reasoning is undecidable The following should not come as a surprise: Theorem 18.17: Query entailment under a set of tgd constraints is undecidable. Proof (sketch): This can be shown by direct encoding of a deterministic TM in query entailment, similar to our proof of Trakhtenbrot s Theorem: Most axioms used there already are in the form of tgds Negative information in consequences of first-order implications are handled by transformation: ϕ H 1 H 2 becomes ϕ H 1 H 2 match() We can avoid equality by axiomatising its effects Halting of the TM can be recognised by a rule that implies match() in this case (to make it easy to recognise, we can transform the TM to have a unique halting state) Checking entailment of BCQ match() then corresponds to checking if the TM halts. Markus Krötzsch, 3rd July 2018 Database Theory slide 15 of 17

Datalog and full tgds Full tgds are closely related to Datalog: Syntactically, both types of rules are the same (we use for Datalog instead of ) Semantically, Datalog query answers (second-order model checking) correspond to entailments of the corresponding full tgds (first-order entailment) The boundaries between both perspectives are blurred in modern works, in particular since tgds are increasingly used to define (query) views. (an EDB/IDB distinction is sometimes made for tgds as well) Markus Krötzsch, 3rd July 2018 Database Theory slide 16 of 17

Datalog and full tgds Full tgds are closely related to Datalog: Syntactically, both types of rules are the same (we use for Datalog instead of ) Semantically, Datalog query answers (second-order model checking) correspond to entailments of the corresponding full tgds (first-order entailment) The boundaries between both perspectives are blurred in modern works, in particular since tgds are increasingly used to define (query) views. (an EDB/IDB distinction is sometimes made for tgds as well) From what we learned about Datalog, we conclude: Theorem 18.18: For full tgds, dependency implication, as well as CQ containment and BCQ entailment under constraints is decidable and ExpTime-complete. Note: We take a strict first-order view here, and use the reduction of Theorem 18.16. BCQ entailment can be decided, e.g., using bottom-up computation. Markus Krötzsch, 3rd July 2018 Database Theory slide 16 of 17

Summary and Outlook Dependencies have many uses in database theory and practice Most practical forms of dependencies can be captured in logical form, with the two most common general cases being Tuple-generating dependencies (tgds) Equality generating dependencies (egds) There are several reasoning problems for dependencies, which are all equivalent (and generally undecidable) Next topics: The Chase Languages of tgds with decidable entailment problems Outlook Markus Krötzsch, 3rd July 2018 Database Theory slide 17 of 17