CMPS 277 Principles of Database Systems. Lecture #4

Similar documents
CMPS 277 Principles of Database Systems. Lecture #3

Schema Mappings and Data Exchange

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Foundations of Databases

Lecture 1: Conjunctive Queries

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

A Retrospective on Datalog 1.0

Relational Databases

Chapter 8: The Relational Algebra and The Relational Calculus

Chapter 6: Formal Relational Query Languages

Relational Calculus. Lecture 4B Kathleen Durant Northeastern University

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Lecture 5: Predicate Calculus. ffl Predicate Logic ffl The Language ffl Semantics: Structures

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

CS 377 Database Systems

Relational Algebra. Procedural language Six basic operators

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

2.2.2.Relational Database concept

3. Relational Data Model 3.5 The Tuple Relational Calculus

Towards a Logical Reconstruction of Relational Database Theory

Query formalisms for relational model relational calculus

Safe Stratified Datalog With Integer Order Does not Have Syntax

Chapter 6 5/2/2008. Chapter Outline. Database State for COMPANY. The Relational Algebra and Calculus

Ian Kenny. November 28, 2017

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

Operations of Relational Algebra

Informationslogistik Unit 4: The Relational Algebra

Relational Algebra 1

QUIZ 1 REVIEW SESSION DATABASE MANAGEMENT SYSTEMS

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

1.3 Primitive Recursive Predicates and Bounded Minimalization

1. The Relational Model

Detecting Logical Errors in SQL Queries

Structural Characterizations of Schema-Mapping Languages

CMP-3440 Database Systems

Composing Schema Mapping

Agenda. Database Systems. Session 5 Main Theme. Relational Algebra, Relational Calculus, and SQL. Dr. Jean-Claude Franchitti

Foundations of Schema Mapping Management

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

SQL s Three-Valued Logic and Certain Answers

Semantic Errors in Database Queries

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

The Complexity of Data Exchange

LTCS Report. Concept Descriptions with Set Constraints and Cardinality Constraints. Franz Baader. LTCS-Report 17-02

CSEN 501 CSEN501 - Databases I

Chapter 5. Relational Algebra and Relational Calculus

The Inverse of a Schema Mapping

Institute of Southern Punjab, Multan

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Phil 320 Chapter 7: Recursive sets and relations Note: 0. Introduction Significance of and main objectives for chapter 7:

Database Tuning and Physical Design: Basics of Query Execution

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Foundations of Databases

Chapter 6 - Part II The Relational Algebra and Calculus

Chapter 6 The Relational Algebra and Relational Calculus

8. Relational Calculus (Part II)

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

Relational Algebra and SQL

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

RELATION AND RELATIONAL OPERATIONS

The area of query languages, and more generally providing access to stored data, is

Chapter 3: Relational Model

DATABASE DESIGN II - 1DL400

RELATIONAL DATA MODEL: Relational Algebra

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

SQL Subqueries. T. M. Murali. September 2, T. M. Murali September 2, 2009 CS 4604: SQL Subqueries

The SQL data-definition language (DDL) allows defining :

Data integration lecture 2

7. Relational Calculus (Part I) 7.1 Introduction

A Generating Plans from Proofs

The Relational Algebra and Calculus. Copyright 2013 Ramez Elmasri and Shamkant B. Navathe

Relational Algebra and Relational Calculus. Pearson Education Limited 1995,

Optimization of logical query plans Eliminating redundant joins

CS317 File and Database Systems

Database Systems CSE 303. Outline. Lecture 06: SQL. What is Sub-query? Sub-query in WHERE clause Subquery

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

Craig Interpolation Theorems and Database Applications

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

COMP9311 Week 10 Lecture. DBMS Architecture. DBMS Architecture and Implementation. Database Application Performance

Laconic Schema Mappings: Computing the Core with SQL Queries

CSC 261/461 Database Systems Lecture 13. Fall 2017

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

Chapter 6 The Relational Algebra and Calculus

More on SQL Nested Queries Aggregate operators and Nulls

SQL. Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University

Query Processing SL03

Predicate Logic CHAPTER What This Chapter Is About

15-819M: Data, Code, Decisions

Propositional Logic Formal Syntax and Semantics. Computability and Logic

Relational Query Languages: Relational Algebra. Juliana Freire

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

Review Material: First Order Logic (FOL)

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Introductory logic and sets for Computer scientists

Optimization of Nested Queries in a Complex Object Model

Transcription:

CMPS 277 Principles of Database Systems http://www.soe.classes.edu/cmps277/winter10 Lecture #4 1

First-Order Logic Question: What is First-Order Logic? Answer: Informally, First-Order Logic = Propositional Logic + ( and ), where and range over possible values occurring in relations. 2

Relational Calculus (First-Order Logic for Databases) First-order variables: x, y, z,, x 1,,x k, They range over values that may occur in tables. Relation symbols: R, S, T, of specified arities (names of relations) Atomic (Basic) Formulas: R(x 1,,x k ), where R is a k-ary relation symbol (alternatively, (x 1,,x k ) R; the variables need not be distinct) (x op y), where op is one of =,, <, >,, (x op c), where c is a constant and op is one of =,, <, >,,. Relational Calculus Formulas: Every atomic formula is a relational calculus formula. If ϕ and ψ are relational calculus formulas, then so are: (ϕæψ), (ϕçψ), ψ, (ϕ ψ) (propositional connectives) ( x ϕ) (existential quantification) ( x ϕ) (universal quantification). 3

Free and Bound Variables A sentence is a first-order formula ψ with no free variables. ( x)e(x,x) ( x)( y)( z)(e(x,z) Æ E(z,y)) ( x)( y)(x < y z (x < z Æ z < y)) On every relational database I, a sentence is either true or false. Either I ψ or I ψ If a first-order formula has at least one free variable, then it makes no sense to tell whether it is true or false on a relational database I. Instead, we need to also assign values to its free variables I, 3, 5 z (x < z Æ z < y) I, 3, 4 z (x < z Æ z < y), where I is the linear order < on the natural numbers 1, 2, 3, 4

Queries Definition: Let S be a relational database schema. A k-ary query on S is a function q defined on the relational database instances over S such that if I is a relational database instance over S, then q(i) is a k-ary relation (i.e., a set of k-tuples). Note: All queries that we have expressed in relational algebra and/or in SQL thus far are queries in the above formal sense. Find the salaries of department chairs (binary query) Find the students who are enrolled in every course taught by Victor Vianu (unary query) The natural join R S of R(A,B,C) and S(B,C,D) (4-ary query) 5

Relational Calculus as a Database Query Language Definition: A relational calculus expression is an expression of the form {(x 1,,x k ): ϕ(x 1, x k )}, where ϕ(x 1,,x k ) is a relational calculus formula with x 1,,x k as its free variables. When applied to a relational database I, this relational calculus expression returns the k-ary relation that consists of all k-tuples (a 1,,a k ) that make the formula true on I. Thus, every relational calculus expression as above defines a k-ary query. Example: The relational calculus expression {(x,y): z(e(x,z) Æ E(z,y)} returns the set P of all pairs of nodes (a,b) that are connected via a path of length 2. 6

Relational Calculus as a Database Query Language Example: FACULTY(name, dpt, salary) Find the names of the highest paid faculty in CS {x: ϕ(x)}, where ϕ(x) is the formula: y,z (FACULTY(x,y,z) Æ y = CS Æ ( u,v,w(faculty(u,v,w) Æ v = CS z w))) Exercise: Express this query in relational algebra and in SQL. Abbreviation: x 1,,x k stands for x 1,, x k x 1,,x k stands for x 1,, x k 7

Natural Join in Relational Calculus Example: Let R(A,B,C) and S(B,C,D) be two ternary relation schemas. Recall that, in relational algebra, the natural join R S is given by π R.A,R.B,R.C,S.D (σ R.B = S.B Æ R.C = S.C (R S)). Give a relational calculus expression for R S {(x 1,x 2,x 3,x 4 ): R(x 1,x 2,x 3 ) Æ S(x 2,x 3,x 4 )} Note: The natural join is expressible by a quantifier-free formula of relational calculus. 8

Quotient in Relational Calculus Recall that the quotient (or division) R S of two relations R and S is the relation of arity r s consisting of all tuples (a 1,,a r-s ) such that for every tuple (b 1,,b s ) in S, we have that (a 1,,a r-s, b 1,,b s ) is in R. Assume that R has arity five and S has arity 2. Express R S in relational calculus. {(x 1,x 2,x 3 ): ( x 4 )( x 5 ) (S(x 4,x 5 ) R(x 1,x 2,x 3,x 4,x 5 ))} Much simpler than the relational algebra expression for R S 9

The need for more formal semantics for Relational Calculus The semantics of the relational calculus expressions considered thus far have been unambiguous (and consistent with our intuition). However, consider the following relational calculus expressions: {(x 1,,x k ): R(x 1,,x k )} {(x,y): z(chair(x,z) Æ y z)}, where CHAIR(dpt,name) {x: y,z ENROLLS(x,y,z)}, with ENROLLS(s-name,course,term) Question: What is the semantics of each of these expressions? 10

The need for more formal semantics for Relational Calculus Fact: To evaluate {(x 1,,x k ): R(x 1,,x k )} we need to know what the possible values for the variables x 1,, x k are. If the variables x 1,,x k range over a domain D, then {(x 1,,x k ): R(x 1,,x k )} = D k R. Note: Intuitively, the relational calculus expression {(x 1,,x k ): R(x 1,,x k )} is not domain independent. In contrast, the relational calculus expression {(x 1,,x k ): S(x 1,..,x k ) Æ R(x 1,,x k )} is domain independent. 11

Active Domain Definition: The active domain adom(ϕ) of a relational calculus formula ϕ is the set of all constants that occur in ϕ. If ϕ is R(x,y), then adom(ϕ) = If ϕ is y(r(x,y) Æ y > 3), then adom(ϕ) = {3}. If ϕ is y(p(x,2,y) R(x,y)), then adom(ϕ) = {2}. The active domain adom(i) of a relational database instance I is the set of all values that occur in the relations of I. 12

Active Domain and Relative Interpretations Definition: Let ϕ(x 1,,x k ) be a relational calculus formula and let I be a relational database instance. If is D a domain such adom(ϕ) adom(i) D, then ϕ D (I) is the result of evaluating ϕ(x 1,,x k ) over D and I, that is, all variables and quantifiers are assumed to range over D, and the relation symbols in ϕ are interpreted by the relations in I. ϕ adom (I) is ϕ D (I), where D = adom(ϕ) adom(i). Note: adom(ϕ) adom(i) is the smallest domain on which it makes sense to evaluate ϕ. 13

Active Domain and Relative Interpretation Example: Let ϕ be R(x,y) and I = {(1,2)}. ϕ adom (I) = {(2,1), (1,1), (2,2)} If D = {1,2,3}, then ϕ D (I)= {(2,1),(1,1),(2,2),(3,3),(1,3),(3,1),(2,3),(3,2)} Note: This example shows that, in general, ϕ adom (I) ϕ D (I) 14

Domain Independence Definition: A relational calculus formula ϕ is domain independent if for every relational instance I and every domain D such that adom(ϕ) adom(i) D, we have that ϕ D (I) = ϕ adom (I). Examples: R(x 1,,x k ) is not domain independent. yr(x,y) is domain independent. (Why?) yr(x,y) is not domain independent. (Why?) P(x) Æ y(r(x,y) y> 5) is domain independent. (Why?) 15

Domain Independence Examples: The following relational calculus expressions are not domain independent {x: ( y)( z) R(x,y,z)} {(x,y): z(chair(x,z) Æ y z)}, where CHAIR(dpt,name) {x: y,z ENROLLS(x,y,z)}, where ENROLLS(s-name,course,term) 16

Equivalence of Relational Algebra and Relational Calculus Theorem: The following are equivalent for a k-ary query q: 1. There is a relational algebra expression E such that q(i) = E(I), for every database instance I (in other words, q is expressible in relational algebra). 2. There is a domain independent relational calculus formula ϕ such that q(i) = ϕ adom (I), for every database instance I (in other words, q is expressible in domain independent relational calculus). 3. There is a relational calculus formula ψ such that q(i) = ψ adom (I), for every database instance I (in other words, q is expressible in relational calculus under the active domain interpretation). 17

From Relational Algebra to Relational Calculus Theorem: For every relational expression E, there is an equivalent relational calculus expression {(x 1,,x k ): ϕ(x 1, x k )}. Proof: By induction on the construction of rel. algebra expressions. If E is a relation R of arity k, then we take {(x 1,,x k ): E(x 1,,x k )}. Assume E 1 and E 2 are expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k )} and {(x 1,,x k ): ϕ 2 (x 1,,x k )}. Then E 1 E 2 is expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k ) Ç ϕ 2 (x 1,,x k )}. E 1 E 2 is expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k ) Æ ϕ 2 (x 1,,x k )}. E 1 E 2 is expressible by {(x 1,,x k,y 1,,y m ): ϕ 1 (x 1,,x k ) Æ ϕ 2 (y 1,,y m )} 18

From Relational Algebra to Relational Calculus Theorem: For every relational expression E, there is an equivalent relational calculus expression {(x 1,,x k ): ϕ(x 1, x k )}. Proof: (continued) Assume that E is expressible by {(x 1,,x k ): ϕ(x 1,,x k )}. Then π 1,3 (E) is expressible by {(x 1,x 3 ): ( x 2 )( x 4 ) ( x k ) ϕ(x 1,,x k ) } σ Θ (E) is expressible by {(x 1,,x k ): Θ* Æ ϕ(x 1,,x k )}, where Θ* is the rewriting of Θ as a formula of relational calculus. Corollary: Relational Calculus is relationally complete. 19

From Relational Algebra to Relational Calculus Example: R(A,B), S(C,D) Translate π 1,4 (σ R.B=S.C (R S)) to relational calculus 1. R translates to R(x,y) 2. S translates to S(z,w) 3. R S translates to R(x,y) Æ S(z,w) 4. σ R.B=S.C (R S) translates to (y=z) Æ R(x,y) Æ S(z,w) 5. π 1,4 (σ R.B=S.C (R S)) translates to y z ((y=z) Æ R(x,y) Æ S(z,w)) or, simply, to y (R(x,y) Æ S(y,w))

Equivalence of Relational Algebra and Relational Calculus Proof (Sketch): 1. 2. We also need to show that the resulting formula is domain independent. Show by induction that this translation of relational algebra to relational calculus is actually a translation of relational algebra to domain independent relational calculus. 2. 3. This implication is obvious. 3. 1. Show first that for every relational database schema S, there is a relational algebra expression E such that for every database instance I, we have that adom(i) = E(I). Use induction on the construction of relational calculus formulas and the above fact to obtain a translation of relational calculus under the active domain interpretation to relational algebra. 21

Equivalence of Relational Algebra and Relational Calculus In this translation, the most interesting part is the simulation of the universal quantifier in relational algebra. It uses the logical equivalence yψ y ψ As an illustration, consider yr(x,y). yr(x,y) y R(x,y) adom(i) = π 1 (R) π 2 (R) Rel.Calc. formula ϕ Relational Algebra Expression for ϕ adom R(x,y) (π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) R y R(x,y) π 1 ((π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) - R) y R(x,y) (π 1 (R) π 2 (R)) (π 1 ((π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) - R)) 22

Equivalence of Relational Algebra and Relational Calculus Remarks: The Equivalence Theorem is effective. Specifically, the proof of this theorem yields two algorithms: an algorithm for translating from relational algebra to domain independent relational calculus, and an algorithm from translating from domain independent relational calculus to relational algebra. Each of these two algorithms runs in linear time. 23

Domain Independent Relational Calculus Note: A desirable feature of a logical formalism is that there is an (efficient) algorithm for determining whether or not an expression is a formula of that formalism. Both relational algebra and relational calculus have this property. Question: Does domain independent relational calculus have this property? In other words, is there an algorithm such that, given a relational calculus formula ϕ, the algorithm tells whether or not ϕ is domain independent? 24

Domain Independent Relational Calculus Bad News Theorem (Di Paola 1969): Determining domain independence is an undecidable problem, i.e., there is no algorithm such that, given a relational calculus formula ϕ, the algorithm tells whether or not ϕ is domain independent. Some Good News: Theorem: Domain independent relational calculus has an effective syntax, i.e., there is a class F of relational calculus formulas such that: There is an (efficient) algorithm for testing membership in F. Every formula in F is domain independent. Every domain independent relational calculus formula is logically equivalent to a formula in F. 25

Domain Independent Relational Calculus For much more on domain independence: Read Sections 5.3 and 5.4 of Foundations of Databases. Read the papers The recursive unsolvability of the decision problem for the class of definite formulas by Robert A. Di Paola, JACM, Vol. 16, 1969, pages 324-327 (available at the class webpages) Safety and translation of relational calculus by Allen van Gelder and Rodney Topor, ACM Transactions on Database Systems, Vol. 16, 1991, pages 235 278 (available at the class webpages). 26

Relational Calculus and SQL Relational calculus has influenced the design of SQL. In particular, existential and universal quantification are used in two different forms in SQL. Both these forms occur in the allowable conditions in the WHERE clause of the SELECT FROM WHERE construct. In addition, sets/multisets are allowed as operands in the WHERE clause (this is what makes existential and universal quantification meaningful). 27

Sets as Operands in SQL Sets are allowed as operands in the WHERE clause. Sets are defined by listing their elements (boring feature), or as the result of a SELECT FROM WHERE construct nested inside the WHERE clause of an outer SELECT FROM.. WHERE (interesting feature) This is what makes SQL a structured language, i.e., we have queries inside queries up to any finite depth of nesting. When sets are used as operands in a comparison clause: We must use one of the keywords IN, NOT IN, SOME, ALL. SOME and ALL must be preceded by one of the of comparison operators =,,,, >, <. The use of SOME and ALL is the first form of existential and universal quantification in SQL. 28

Sets as Operands in SQL Example: FACULTY(name,dpt,salary) Find the names of faculty who are in a department in which no member earns more than $175,000. SELECT name FROM FACULTY WHERE dpt NOT IN (SELECT dpt FROM FACULTY WHERE salary > 175,000) Exercise: Express this query without using an SQL subquery. 29

SOME and ALL in SQL Syntax: In the WHERE clause, we can have have subclauses of the form <attribute name> op SOME T <attribute name> op ALL T, where op is one of the comparison operators =,,,, >, < T is the result of a nested SELECT FROM WHERE clause. Semantics: <attribute name> op SOME T means: ( x)(x T Æ <attribute name> op x) <attribute name> op ALL S means: ( x)(x T <attribute name> op x). 30

SOME and ALL in SQL Note: <attribute name> = SOME T is the same as IN <attribute name> ALL S is the same as NOT IN Note: Earlier versions of SQL used ANY in place of SOME. The use of ANY can be quite confusing and can lead to errors. Even if the system supports ANY, it is better to avoid using it. 31

SOME and ALL in SQL Example: FACULTY(name,dpt,salary) Find the highest paid faculty in CS SELECT name FROM FACULTY WHERE dpt = CS AND salary ALL (SELECT salary FROM FACULTY WHERE dpt = CS ). Question: What is the result of the following SQL query? SELECT name FROM FACULTY WHERE dpt = CS AND salary > ALL (SELECT salary FROM FACULTY WHERE dpt = CS ). 32

SOME and ALL in SQL Question: What are the results of the following two SQL queries? SELECT name FROM FACULTY WHERE dpt = CS AND salary > SOME (SELECT salary FROM FACULTY WHERE dpt = CS ). SELECT name FROM FACULTY WHERE dpt = CS AND salary SOME (SELECT salary FROM FACULTY WHERE dpt = CS ). Answer: The first returns all CS faculty who are not the lowest paid ones. The second returns all CS faculty. 33

EXISTS and NOT EXISTS in SQL Syntax: SELECT FROM WHERE EXISTS (SELECT FROM WHERE) Semantics: The subquery (SELECT FROM WHERE) is evaluated and the resulting set is tested for emptiness: If it is non-empty, then the condition in WHERE evaluates to true ; otherwise, it evaluates to false. Syntax: SELECT FROM WHERE NOT EXISTS (SELECT FROM WHERE) Semantics: The subquery (SELECT FROM WHERE) is evaluated and the resulting set is tested for emptiness: If it is empty, then the condition in WHERE evaluates to true ; otherwise, it evaluates to false. 34

EXISTS and NOT EXISTS in SQL Example: FACULTY(name,dpt,salary) Find the faculty in the CS dpt who are not the lowest paid ones. SELECT R.name FROM FACULTY as R WHERE dpt = CS AND EXISTS (SELECT * FROM FACULTY AS T WHERE R.dpt = CS AND R.salary > T.salary) Note: This is an example of a correlated subquery: The subquery has to be evaluated separately for each tuple in the FROM list of the outer query. The tuple is kept or removed depending on the result of the EXISTS test. 35

EXISTS and NOT EXISTS in SQL Example: FACULTY(name,dpt,salary) Find the highest paid faculty in CS. SELECT R.name FROM FACULTY as R WHERE dpt = CS AND NOT EXISTS (SELECT * FROM FACULTY AS T WHERE R.dpt = CS AND R.salary < T.salary) Note: SQL queries with SOME and ALL can be transformed to SQL queries with EXISTS and NOT EXISTS. SOME, ALL, EXISTS, NOT EXISTS are imported directly from relational calculus. 36