Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Similar documents
Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Foundations of Databases

CMPS 277 Principles of Database Systems. Lecture #4

Schema Mappings and Data Exchange

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

Foundations of Databases

CMPS 277 Principles of Database Systems. Lecture #3

Lecture 1: Conjunctive Queries

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

A Retrospective on Datalog 1.0

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

Outline. Database Theory. Prerequisites and Admission. Classes VU , SS 2018

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

CSE 544: Principles of Database Systems

Computability Theory XI

Information Systems (Informationssysteme)

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

1.3 Primitive Recursive Predicates and Bounded Minimalization

Lecture 5: Predicate Calculus. ffl Predicate Logic ffl The Language ffl Semantics: Structures

Complexity Theory VU , SS General Information. Reinhard Pichler

Elementary Recursive Function Theory

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Informationslogistik Unit 4: The Relational Algebra

2.2.2.Relational Database concept

Relational Databases

Relational Calculus. Lecture 4B Kathleen Durant Northeastern University

LTCS Report. Concept Descriptions with Set Constraints and Cardinality Constraints. Franz Baader. LTCS-Report 17-02

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Range Restriction for General Formulas

A. Arratia Program Schemes 1. Program Schemes. Reunión MOISES, 3 Septiembre Argimiro Arratia. Universidad de Valladolid

Towards a Logical Reconstruction of Relational Database Theory

Lecture 6: Arithmetic and Threshold Circuits

Detecting Logical Errors in SQL Queries

Safe Stratified Datalog With Integer Order Does not Have Syntax

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

RELATIONAL DATA MODEL: Relational Algebra

Discrete Mathematics Lecture 4. Harper Langston New York University

Database Systems. Basics of the Relational Data Model

3. Relational Data Model 3.5 The Tuple Relational Calculus

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Chapter 5: Other Relational Languages

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

Finite Model Generation for Isabelle/HOL Using a SAT Solver

CSC Discrete Math I, Spring Sets

Chapter 3: Relational Model

Exercises on Semantics of Programming Languages

Database Theory: Beyond FO

COMPUTABILITY THEORY AND RECURSIVELY ENUMERABLE SETS

Access Patterns and Integrity Constraints Revisited

A Generating Plans from Proofs

2 A topological interlude

( A(x) B(x) C(x)) (A(x) A(y)) (C(x) C(y))

Updated: March 31, 2016 Calculus III Section Math 232. Calculus III. Brian Veitch Fall 2015 Northern Illinois University

Relational Model, Relational Algebra, and SQL

SQL s Three-Valued Logic and Certain Answers

CSC 501 Semantics of Programming Languages

Checking Containment of Schema Mappings (Preliminary Report)

An Annotated Language

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Relational Database Systems 1

Fixed-Parameter Algorithms, IA166

Lecture 3: Constructing the Natural Numbers

Structural characterizations of schema mapping languages

The Relational Model

Relational Algebra 1

Module 6. Knowledge Representation and Logic (First Order Logic) Version 2 CSE IIT, Kharagpur

HOL DEFINING HIGHER ORDER LOGIC LAST TIME ON HOL CONTENT. Slide 3. Slide 1. Slide 4. Slide 2 WHAT IS HIGHER ORDER LOGIC? 2 LAST TIME ON HOL 1

Ontology and Database Systems: Foundations of Database Systems

Relational Algebra and Calculus

From Types to Sets in Isabelle/HOL

Foundations of SPARQL Query Optimization

Going beyond propositional logic

Chapter 6: Formal Relational Query Languages

Warm-Up Problem. Let L be the language consisting of as constant symbols, as a function symbol and as a predicate symbol. Give an interpretation where

1. Introduction; Selection, Projection. 2. Cartesian Product, Join. 5. Formal Definitions, A Bit of Theory

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

18. Relational Database Design

Positive higher-order queries

Foundations of Databases

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

EXTENSIONS OF FIRST ORDER LOGIC

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Complexity of Answering Queries Using Materialized Views

6c Lecture 3 & 4: April 8 & 10, 2014

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Foundations of Computer Science Spring Mathematical Preliminaries

Structural Characterizations of Schema-Mapping Languages

Ian Kenny. November 28, 2017

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

The area of query languages, and more generally providing access to stored data, is

A TriAL: A navigational algebra for RDF triplestores

Lecture 9: Datalog with Negation

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

Transcription:

Database Theory Database Theory VU 181.140, SS 2011 3. Codd s Theorem Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 29 March, 2011 Pichler 29 March, 2011 Page 1

Database Theory Outline 3. Codd s Theorem 3.1 Domain Independence 3.2 Syntactic Criterion for Domain Independence 3.3 Codd s Theorem From the Algebra to RR FO From RR Queries to the Algebra 3.4 Strengthening of Codd s Theorem Pichler 29 March, 2011 Page 2

Database Theory 3. Codd s Theorem 3.1. Domain Independence What is Codd s Theorem? Domain Independence: Some queries are unsafe and must be avoided. Range Restriction: A syntactic condition for domain independence. Codd s Theorem: relational algebra and range-restricted calculus are equally expressive. We prove the theorem: show how to translate back and forth between the two languages. use of relativization: bounding the ranges of quantifiers using finite domains. Strengthening of Codd s Theorem: the following languages are equally expressive: domain independent calculus, relational algebra, non-recursive Datalog with negation, and calculus under the active domain semantics. Pichler 29 March, 2011 Page 3

Database Theory 3. Codd s Theorem 3.1. Domain Independence Domain Independence Domain Independence: Given a query and a database, the query must evaluate to the same result on the database no matter what the domain is assumed to be. Idea: exclude unsafe queries, i.e., in particular, queries that may yield an infinite answer. Let Q B (A) denote the result of evaluating query Q on database A assuming domain B. Definition (domain independence) A query Q is domain independent iff there do not exist a database instance A and two sets B, C that contain all constants that appear in A or in Q (also known as the active domain), such that Q B (A) Q C (A). Pichler 29 March, 2011 Page 4

Database Theory 3. Codd s Theorem 3.1. Domain Independence Queries Violating Domain Independence Example (Unsafe Queries) {x R(x)} R =, 1 B, 1 C: 1 Q B (R), 1 Q C (R). {x R(x) R(y)} {y R(x)} {x R(x) R(x)} Remark. Over infinite domains, these queries may yield an infinite result. Pichler 29 March, 2011 Page 5

Database Theory 3. Codd s Theorem 3.1. Domain Independence Undecidability of Domain Independence We would like to require domain independence in queries. Domain independence is an undecidable property for FO queries. Usual solution: syntactic restrictions, e.g. range-restricted queries. Sufficient condition for domain independence. We shall use the following theorem without a proof: Theorem (A) For every domain-independent FO query, there is an equivalent range-restricted FO query. See Abiteboul, Hull, Vianu, Foundations of Databases, Chapter 5. We show here that RR FO and rel. algebra have the same expressive power. For Theorem A, it suffices to show that also rel. algebra and domain-independent FO have the same expressive power. From rel. algebra to domain independent calculus: Lemma 5.3.11. From domain independent calculus via active domain semantics to rel. algebra: Lemma 5.3.12. Pichler 29 March, 2011 Page 6

Database Theory 3. Codd s Theorem 3.2. Syntactic Criterion for Domain Independence Range-restricted Queries Preprocessing. A formula is turned into safe-range normal form (SRNF) by the following steps: 1 Variable substitution: no distinct pair of quantifiers may employ the same variable and no variable may occur both bound and free. Example: ( x ϕ(x)) ( x ψ(x)) ( x ϕ(x)) ( x ψ(x )). 2 Remove universal quantifiers: x ϕ x ϕ. 3 Remove implications: ϕ ψ ϕ ψ. 4 Remove double negation: ϕ ϕ. 5 Flatten and/or, e.g.: (ϕ ψ) π ϕ ψ π. Pichler 29 March, 2011 Page 7

Database Theory 3. Codd s Theorem 3.2. Syntactic Criterion for Domain Independence Range-restriction Check Definition Given: Input formula π in SRNF. rr(x = a) := {x} rr(r(t 1,..., t n )) := Vars({t 1,..., t n }) rr(ϕ ψ) := rr(ϕ) rr(ψ) rr(ϕ ψ) := rr(ϕ) rr(ψ) j rr(ϕ)... {x, y} rr(ϕ) = rr(ϕ x = y) := rr(ϕ) {x, y}... otherwise rr( ϕ) := j rr(ϕ) {x}... x rr(ϕ) rr( x ϕ) := fail... otherwise If free(π) rr(π), then π is range-restricted (RR). Pichler 29 March, 2011 Page 8

Database Theory 3. Codd s Theorem 3.2. Syntactic Criterion for Domain Independence Range-restriction Examples Example (in SRNF) rr( )={x,y} z : rr( )={x,y,z} rr( )={x,y,z} P(x, y, z) rr( )={x,y,z} ( rr( )={x,y} R(x, y) rr( )={z} (( rr( )={z} rr( )={z} S(z) rr( ) = free( ) = {x, y} range-restricted! rr( )= T (x, z)) T (y, z))) Pichler 29 March, 2011 Page 9

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem Main Theorem Theorem (Codd) A query is definable in relational algebra if and only if it is definable in the range-restricted domain calculus. Proof (details on the following slides): 1 From the Algebra to RR FO: almost by definition. 2 From RR FO to the Algebra: 1 Put into SRNF. 2 Put into Relational-Algebra NF (RANF): RANF: Each subformula is range-restricted. (Exception: In a subformula π = ϕ 1 ϕ k ψ, π has to be RR and the ϕ i and ψ have to be in RANF, but ψ does not have to be RR). 3 Translate the RANF formula into relational algebra. This can be done inductively from the leaves to the root of the parse tree of the formula. Query languages of this expressive power are called relationally complete. Pichler 29 March, 2011 Page 10

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From the Algebra to RR FO R := { x R( x)} σ x=y ({ x ϕ( x)}) := { x ϕ( x) x = y} π y ({ x ϕ( x)}) := { y z ϕ( x)} ( x = y z) { x ϕ( x)} { y ψ( y)} := { x y ϕ( x) ψ( y)} { x ϕ( x)} { x ψ( x)} := { x ϕ( x) ψ( x)} { x ϕ( x)} { x ψ( x)} := { x ϕ( x) ψ( x)} Pichler 29 March, 2011 Page 11

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From SRNF to RANF: Active Domain Active domain contains precisely the union of all the atomic/field values occurring anywhere in the database (or the query). A unary active domain relation can be assumed: It is definable in relational algebra. Example For schema R(A, B, C), S(B, D), T (E), the active domain relation is π A R π B R π C R π B S π D S T. We assume a single domain here; in the real-world case with values of different types (e.g. strings, integers), we can compute an active domain relation for each type. Pichler 29 March, 2011 Page 12

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From SRNF to RANF Given a formula in SRNF. The formula is in RANF if each subformula is range-restricted. Possible obstacles: subformulae of the form ϕ ψ or ϕ. Only these remove possibly relevant variables from rr. Solution: relativize using the active domain relation D: ϕ ψ (ϕ D(x)) (ψ x (free(ψ) free(ϕ)) x (free(ϕ) free(ψ)) D(x)) }{{} rr( )=free( )=free(ϕ) free(ψ) ϕ ( ϕ D(x)) x free(ϕ) Shorter (=better) RANF queries can be achieved by rewriting the input formula using equivalences we already know. Pichler 29 March, 2011 Page 13

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RANF to Relational Algebra RANF formulae can be translated to relational algebra using the following rules: Alg(ϕ ψ) := Alg(ϕ) Alg(ψ) Alg(ϕ ψ) := Alg(ϕ) Alg(ψ)... ϕ and ψ have the same schema Alg( y ϕ( x, y)) := π x Alg(ϕ( x, y)) Alg(ϕ xϑt) := σ xϑt Alg(ϕ)... ϑ is either = or and t is a term. Alg(R(x 1,..., x k )) := ρ A1...A k x 1...x k R... relation R has schema R(A 1,..., A k ) Pichler 29 March, 2011 Page 14

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example Let D be the active domain relation with schema D(D) and let R have schema R(A, B). The formula ( ) x D(x) y (D(y) R(x, y)) corresponds to the RANF formula ( ) x y (D(x) D(y)) R(x, y). An equivalent relational algebra expression looks as follows. Alg( x y ((D(x) D(y)) R(x, y))) π ( Alg((D(x) D(y)) Alg(R(x, y)) ) π (ρ x D ρ y D ρ xy R) Pichler 29 March, 2011 Page 15

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example In SRNF but not in RANF (i.e., not locally range-restricted): rr( )={z} {(x, y) z : P(x, y, z) (R(x, y) ((S(z) T (x, z)) T (y, z)))} We transform this formula using the rewrite rule ϕ (ψ 1 ψ 2 ) (ϕ ψ 1 ) (ϕ ψ 2 ) into RANF: {(x, y) z : P(x, y, z) (R(x, y) S(z) T (x, z)) (R(x, y) T (y, z))} Pichler 29 March, 2011 Page 16

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example (in RANF) {(x, y) z : e 1 =ρ xyz P P(x, y, z) e 2 =( ) ((ρ y D) ( )) ( ) ( ) ρ xy R ( R(x, y) ρ z S S(z) e 21 =ρ xz T T (x, z)) Equivalent relational algebra expression: π xy (e 1 e 2 e 3 ). e 3 =( ) ( ) { }} { ρ xy R ρ yz T ( R(x, y) T (y, z))} Pichler 29 March, 2011 Page 17

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example Select all professors (P) who only give lectures (L) in the field of computer science (C). The schemata are P(P), L(P, C), C(C) and the active domain is given by a relation D with schema D(D). to SRNF to RANF {x P(x) y : L(x, y) C(y)} {x P(x) y : L(x, y) C(y)} {x P(x) y : L(x, y) (D(y) C(y) )} }{{} (ρ C D) C } {{ } } L ((ρ C D) C) {{ } } π P (L ((ρ C D) C)) {{ } P π P (L ((ρ C D) C)) Pichler 29 March, 2011 Page 18

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example The schemata are L(P, C), C(C) and the domain is given by D(D). { x, y L(x, y) C(y)} { x, y L(x, y) (D(y) C(y) )} {z } L ((ρ C D) C) D D 1 2 3 4 L P C 1 2 3 4 L (ρ C D) C P C 3 4 { x, y L(x, y) (D(x) C(y) )} {z } L ((ρ C D) C) C C 2 (ρ C D) C C 1 3 4 (ρ C D) C P C 1 2 2 2 3 2 4 2 Pichler 29 March, 2011 Page 19

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example Schema R(AB), S(C), T (AC); active domain D(D). { x, y, z R(x, y) (S(z) T (x, z))} { x, y, z R(x, y) ((D(x) S(z) ) T (x, z) )} }{{} D S }{{} (D S) T } {{ } R ((D S) T ) Pichler 29 March, 2011 Page 20

Database Theory 3. Codd s Theorem 3.3. Codd s Theorem From RR Queries to the Algebra Example Schema R(AB), S(AC), T (BC); active domain D(D). { x, y, z R(x, y) (S(x, z) T (y, z))} { x, y, z (R(x, y) S(x, z) ) (R(x, y) T (y, z) )} }{{} or R S R T } {{ } (R S) (R T ) { x, y, z R(x, y) (S(x, z) T (y, z))} { x, y, z R(x, y) ((S(x, z) D(y)) (T (y, z) D(x))) } }{{} R ((S ρ B D) (T D)) This is correct because R(x, y) (ϕ ψ) R(x, y) D(x) D(y) (ϕ ψ) R(x, y) ((D(x) D(y) ϕ) (D(x) D(y) ψ)). Pichler 29 March, 2011 Page 21

Database Theory 3. Codd s Theorem 3.4. Strengthening of Codd s Theorem Strengthening of Codd s Theorem Theorem Domain independent calculus and relational algebra are equally expressive. A direct consequence of Theorem (A) and Codd s Theorem. In fact, Codd s Theorem usually refers to the above formulation. By the above theorem and a transtion via relational algebra the following holds: Theorem Domain independent calculus and aggregation-free SQL queries are equally expressive. Pichler 29 March, 2011 Page 22

Database Theory 3. Codd s Theorem 3.4. Strengthening of Codd s Theorem Strengthening of Codd s Theorem (2) Nonrecursive Datalog with negation (nr-datalog ) prohibits cycles in the dependency graph DEP(P) of a program P. Theorem nonrecursiveness means that negation is trivially stratified! Domain independent domain calculus and non-recursive Datalog with negation are equally expressive. Easiest way to see: provide a translation between nr-datalog and relational algebra; then apply the theorem from the previous slide. See Abiteboul, Hull, Vianu, Foundations of Databases, Chapter 5. Pichler 29 March, 2011 Page 23

Database Theory 3. Codd s Theorem 3.4. Strengthening of Codd s Theorem Strengthening of Codd s Theorem (3) Active domain semantics for domain calculus: quantified variables range over the active domain, i.e. over values occurring in the database (or the query). Example Assume a database D with R = { 1, 1 } and dom(r) = {1, 2}. Consider the Boolean query q = { x, y.r(x, y).}: q is false in D under the standard semantics, but q is true in D under the active domain semantics. Theorem Domain independent domain calculus and domain calculus under the active domain semantics are equally expressive. See Abiteboul, Hull, Vianu, Foundations of Databases, Chapter 5. Pichler 29 March, 2011 Page 24

Database Theory 3. Codd s Theorem 3.4. Strengthening of Codd s Theorem Learning objectives Understanding: the notion of domain independence, the active domain, the notion of range restricted queries, the formulation and the proof of Codd s Theorem, the strengthening of Codd s Theorem. Pichler 29 March, 2011 Page 25