Lecture 1: Conjunctive Queries

Similar documents
Schema Mappings and Data Exchange

CMPS 277 Principles of Database Systems. Lecture #3

Foundations of Databases

The area of query languages, and more generally providing access to stored data, is

Optimization of logical query plans Eliminating redundant joins

Lecture 9: Datalog with Negation

CMPS 277 Principles of Database Systems. Lecture #4

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Database Theory: Beyond FO

Data integration lecture 2

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Uncertainty in Databases. Lecture 2: Essential Database Foundations

CS 377 Database Systems

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

A Retrospective on Datalog 1.0

Foundations of Schema Mapping Management

DATABASE DESIGN II - 1DL400

Foundations of Databases

Towards a Logical Reconstruction of Relational Database Theory

1. The Relational Model

Relative Information Completeness

A Generating Plans from Proofs

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Introductory logic and sets for Computer scientists

Logical Aspects of Massively Parallel and Distributed Systems

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Chapter 6: Formal Relational Query Languages

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Range Restriction for General Formulas

DATABASE SYSTEMS. Database Systems 1 L. Libkin

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

Relational Algebra. Procedural language Six basic operators

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Chapter 6 Formal Relational Query Languages

Safe Stratified Datalog With Integer Order Does not Have Syntax

Lecture 4: January 12, 2015

Query formalisms for relational model relational calculus

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

DATABASE SYSTEMS. Database Systems 1 L. Libkin

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

Three Query Language Formalisms

Introduction to Data Management CSE 344. Lecture 14: Datalog

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

CSE20: Discrete Mathematics

Warm-Up Problem. Let L be the language consisting of as constant symbols, as a function symbol and as a predicate symbol. Give an interpretation where

CMPS 277 Principles of Database Systems. Lecture #11

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

RELATIONAL DATA MODEL: Relational Algebra

Automata Theory for Reasoning about Actions

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Operations of Relational Algebra

Notes. Notes. Introduction. Notes. Propositional Functions. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry.

Relational Databases

Relational Model, Relational Algebra, and SQL

Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Informationslogistik Unit 4: The Relational Algebra

Positive higher-order queries

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

8. Relational Calculus (Part II)

Introduction Relational Algebra Operations

Ontology and Database Systems: Foundations of Database Systems

Complexity of Answering Queries Using Materialized Views

Chapter 6 The Relational Algebra and Calculus

Computational Logic. Relational Query Languages. Free University of Bozen-Bolzano, Werner Nutt

CSC Discrete Math I, Spring Sets

UNIT 2 RELATIONAL MODEL

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Chapter 3: Propositional Languages

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter p. 1/27

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

The Relational Model

Constraint Propagation for Efficient Inference in Markov Logic

COMP2411 Lecture 20: Logic Programming Examples. (This material not in the book)

Chapter 8: The Relational Algebra and The Relational Calculus

Access Patterns and Integrity Constraints Revisited

COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER. lg Cambridge UNIVERSITY PRESS

Ontologies and Databases

The Inverse of a Schema Mapping

SQL s Three-Valued Logic and Certain Answers

Logic and its Applications

Chapter 6 The Relational Algebra and Relational Calculus

Automated Reasoning. Natural Deduction in First-Order Logic

Three Query Language Formalisms

Structural characterizations of schema mapping languages

CSE 344 MAY 7 TH EXAM REVIEW

Chapter 3: Relational Model

CS317 File and Database Systems

Relational Database: The Relational Data Model; Operations on Database Relations

Data Integration: Logic Query Languages

A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries

Logical Aspects of Massively Parallel and Distributed Systems

Uncertain Data Models

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Laconic Schema Mappings: Computing the Core with SQL Queries

Transcription:

CS 784: Foundations of Data Management Spring 2017 Instructor: Paris Koutris Lecture 1: Conjunctive Queries A database schema R is a set of relations: we will typically use the symbols R, S, T,... to denote relations. The arity of a relation R is the number of attributes in the relation. This is the unnamed perspective of a relational schema, since we do not associate a name with each attribute in the relation. Contrast this to defining a relation as R(A, B), where we can refer to each of the attributes as R.A and R.B respectively. The attributes in a relation can take values from the same domain dom, which is a countably infinite set. We can assume that each attribute takes values from a different domain (in this case we would denote the domain as dom(a)), but from a theoretical perspective it almost always suffices to have a single shared domain. A constant is an element of the domain dom. Let R be a relation of arity m. A fact over R is an expression of the form R(a 1,..., a m ), where a i dom for every i = 1,..., m. An instance of the relation R is a finite set of facts over R. It is important that we define an instance as a set, and not as a bag (set semantics vs bag semantics). A database instance I over a database schema R is a union of relational instances over the relations R R. In this case, we denote R I the instance of relation R R. Given a database instance I, we define the active domain of I, denoted adom(i) as the set of all constants occurring in I. 1.1 Basics of Conjunctive Queries Conjunctive queries are the simplest form of queries that can be expressed over a database, but as we will see they have many interesting properties and a deep theory behind them. There are many ways to define a conjunctive query, and we will do it from a logical perspective first, using Datalog notation. Syntactically, a conjunctive query q (or simply CQ) is an expression of the form q(x 1,..., x k ) : R 1 ( y 1 ),..., R n ( y n ) (1.1) where n 0, R i R for every i = 1,..., n and q is a fresh relation name. The expressions x, y 1,..., y n are called free tuples, and contain either variables or constants. We will typically name the variables x, y, z,..., and the constants a, b, c,... There are two syntactic restrictions on how a conjunctive query is formed: 1. The tuples y i must match the arities of the corresponding relation. 2. Every variable in x = x 1,..., x k must appear at least once in y 1,..., y n. 1-1

Lecture 1: Conjunctive Queries 1-2 The expression q(x 1,..., x k ) is called the head of the query, and R 1 ( y 1 ),..., R n ( y n ) is called the body of the query. Each expression R i ( y i ) is called an atom: notice that the atom is different from a relation, since many atoms can correspond to the same relation! The set of variables in the query is denoted var(q). Example 1.1. Below are some examples of conjunctive queries: q 1 (x, y) : R(x, y), S(y, z) q 2 () : R(x, y), S(y, a), T(x) q 3 (x, y, z) : R(x, y), R(y, z), R(z, x) q 4 (x, b) : R(x, x), S(x, y) When writing CQs in Datalog form, we can also choose to use equality (=). For example, we can equivalently write the query q(x) : R(x, a) as q(x) : R(x, y), y = a. However, we are not allowed to use any other predicate symbols, such as <,, >,, =. 1.1.1 Semantics So far we looked at the syntactic definition of a conjunctive query. We now turn our attention to the semantics of CQs. The intuition here is that we will try to match to each variable of the body a value from the domain such that the body is true, and then we can infer a new fact from the head of the query. Definition 1.2. A valuation (or substitution) v over a set of variables V is a total function 1 from V to the domain dom. For example, for query q 1, the function v, where v(x) = a, v(y) = b, v(z) = c is a valuation. We extend the valuation to be the identity from dom to dom, and then extend it naturally to map free tuples to tuples over dom. For example, v((x, y)) = (v(x), v(y)) = (a, b), and v((x, y, c)) = (v(x), v(y), v(c)) = (a, b, c). We can now formally define the semantics for CQs. Let I be a database instance over the schema R. Then, for the conjunctive query q, as given in (1.1), the result q(i) of executing query q over the database instance I is: q(i) = {v( x) v is a valuation over var(q) and i = 1,..., n : v( y i ) R I i } The query q returns a new relational instance over a new schema; the arity of the instance is equal to the arity of the head of q. Example 1.3. We will execute query q 1 (x, y) : R(x, y), S(y, z) over the relational instance I = {R(a, a), R(a, b), R(b, c), R(c, a), S(b, c), S(b, b), T(a)}. 1 A total function is a function defined for all possible input values.

Lecture 1: Conjunctive Queries 1-3 There are two valuations over the set of variables {x, y, z} that will result in an output tuple. The first valuation is v(x) = a, v(y) = b, v(z) = c. Since v((x, y)) = (a, b) R I and v(y, z) = (b, c) S I, this valuation will give the output tuple v(x, y) = (a, b). The second valuation is v(x) = a, v(y) = b, v(z) = b; this valuation gives the same output tuple (a, b). There is no other valuation that gives an output tuple. The final result is q 1 (I) = {(a, b)}. Exercise 1.4. Evaluate the queries q 2, q 3, q 4 over the database instance I = {R(a, a), R(a, b), R(b, c), R(c, a), S(b, c), S(b, b), T(a)}. In Datalog terminology, the expressions R i ( y i ) are called subgoals. The relations R 1,..., R n are called extensional relations, since they are provided as input to the query. The relation q is called intensional relation, since its content is only given by "intension" or "definition" through the query. 1.1.2 Equivalent Formalisms In the formalism of relational calculus, the query q from (1.1) can be written as follows: {x 1,..., x k z 1,..., z m (R 1 ( y 1 ) R n ( y n ))} where z 1,..., z m are the variables that appear in the body, but not in the head of the query q. Notice that this is a first-order logical formula that consists of only existential quantification, followed by a conjunction of atoms: this is the reason why this class of queries is called conjunctive queries. As an example, q 1 would be expressed in relational calculus as follows: {x, y z(r(x, y) S(y, z))} The above formalism in relational calculus is equivalent to the Datalog definition of CQs. The other formalism that is equivalent is the class of SPJ queries in relational algebra: these are queries that contain only selections (S) with equality, projections (P) and joins (J). Notice that the relational algebra formalism is procedural, in contrast to the other formalisms that are declarative; in other words, they specify what the result of a query is instead of specifying how to compute it. Exercise 1.5. Express the queries q 1,..., q 4 in relational algebra and relational calculus. In SQL, conjunctive queries correspond to SELECT FROM WHERE queries, where the WHERE conditions contain only equalities. 1.1.3 More Definitions Definition 1.6 (Monotonicity). A query q is monotone if for instances I J, it holds that q(i) q(j). An equivalent way of expressing monotonicity is that q(i) q(i {t}); in other words, adding a tuple in an instance will never cause the output result of q to decrease in size.

Lecture 1: Conjunctive Queries 1-4 Proposition 1.7. Every Conjunctive Query is monotone. Proof. Consider some tuple t q(i). Then, there exists a valuation v over var(q) such that t = v( x), and for every R i, we have v( y i ) Ri I. Since I J, we have that v( y i) R J i, and thus t q(j) as well. The above proposition immediately tells us that there are queries that cannot be expressed as a Conjunctive Query. Indeed, any query that expresses a non-monotone property cannot be written as a CQ. For example, suppose that we have a binary relation R, and we express the following query: return all tuples of the form (a, b) such that (b, a) is not in R. Definition 1.8 (Satisfiability). A query q is satisfiable if there exists an instance I such that q(i) =. Proposition 1.9. Every Conjunctive Query is satisfiable. When the head of a CQ is of the form q(), then we say that q is a boolean CQ. For example, the query q 2 is boolean. The answer to a boolean CQ is essentially a yes or no, depending on whether the answer is the set containing a single tuple with no attributes { }, or it is the empty set {} respectively. When the head of the CQ contains all the variables in the body, then we say it is a full CQ; this is equivalent to a query in relational algebra that has no projections. 1.2 Extensions of Conjunctive Queries We can extend the class of Conjunctive Queries to obtain classes of queries that are strictly more expressive than CQs. CQ =. The query class CQ = is obtained by adding inequality ( =) to conjunctive queries. For example, we can now express the following query: "Return the endpoints for paths of length 3 that start and end in different vertices. q(x, w) : R(x, y), R(y, z), R(z, w), x = w. The above query cannot be expressed as a standard CQ. CQ <.The class CQ < is obtained by adding the operators <,, >, to conjunctive queries. In this case, we also have to assume a total order on the values of the domain dom. For example, we can express the following query: "Return the endpoints for paths of length 3 with strictly increasing value of vertices." q(x, w) : R(x, y), R(y, z), R(z, w), x < y, y < z, z < w. UCQ. The class UCQ (Union of Conjunctive Queries) is obtained by adding union to conjunctive queries. A UCQ is a query of the form q 1 q 2... q m, where each q i is a conjunctive query. For example, the query q = q 1 q 2, where q 1 (x, y) = R(x, z), R(z, y), and q 2 (x, y) = R(x, z), R(z, w), R(w, y)

Lecture 1: Conjunctive Queries 1-5 is a UCQ that returns the endpoints of paths of length 2 or 3. We can also write the above UCQ as two Datalog rules with the same head: q(x, y) : R(x, z), R(z, y). q(x, y) : R(x, z), R(z, w), R(w, y). The class of UCQs corresponds to the fragment of relational algebra that uses Selection, Projection, Joins and Union (SPJU), and to relational calculus queries that uses,, (so no universal quantifier or negation). Proposition 1.10. The query languages CQ =, CQ <, UCQ are all monotone. CQ. The class CQ is obtained by adding negation ( ) to conjunctive queries. For example, we can now express the following query: return all tuples of the form (a, b) such that (b, a) is not in R. q(x, y) : R(x, y), R(y, x).