CMPS 277 Principles of Database Systems http://www.soe.classes.edu/cmps277/winter10 Lecture #4 1
First-Order Logic Question: What is First-Order Logic? Answer: Informally, First-Order Logic = Propositional Logic + ( and ), where and range over possible values occurring in relations. 2
Relational Calculus (First-Order Logic for Databases) First-order variables: x, y, z,, x 1,,x k, They range over values that may occur in tables. Relation symbols: R, S, T, of specified arities (names of relations) Atomic (Basic) Formulas: R(x 1,,x k ), where R is a k-ary relation symbol (alternatively, (x 1,,x k ) R; the variables need not be distinct) (x op y), where op is one of =,, <, >,, (x op c), where c is a constant and op is one of =,, <, >,,. Relational Calculus Formulas: Every atomic formula is a relational calculus formula. If ϕ and ψ are relational calculus formulas, then so are: (ϕæψ), (ϕçψ), ψ, (ϕ ψ) (propositional connectives) ( x ϕ) (existential quantification) ( x ϕ) (universal quantification). 3
Free and Bound Variables A sentence is a first-order formula ψ with no free variables. ( x)e(x,x) ( x)( y)( z)(e(x,z) Æ E(z,y)) ( x)( y)(x < y z (x < z Æ z < y)) On every relational database I, a sentence is either true or false. Either I ψ or I ψ If a first-order formula has at least one free variable, then it makes no sense to tell whether it is true or false on a relational database I. Instead, we need to also assign values to its free variables I, 3, 5 z (x < z Æ z < y) I, 3, 4 z (x < z Æ z < y), where I is the linear order < on the natural numbers 1, 2, 3, 4
Queries Definition: Let S be a relational database schema. A k-ary query on S is a function q defined on the relational database instances over S such that if I is a relational database instance over S, then q(i) is a k-ary relation (i.e., a set of k-tuples). Note: All queries that we have expressed in relational algebra and/or in SQL thus far are queries in the above formal sense. Find the salaries of department chairs (binary query) Find the students who are enrolled in every course taught by Victor Vianu (unary query) The natural join R S of R(A,B,C) and S(B,C,D) (4-ary query) 5
Relational Calculus as a Database Query Language Definition: A relational calculus expression is an expression of the form {(x 1,,x k ): ϕ(x 1, x k )}, where ϕ(x 1,,x k ) is a relational calculus formula with x 1,,x k as its free variables. When applied to a relational database I, this relational calculus expression returns the k-ary relation that consists of all k-tuples (a 1,,a k ) that make the formula true on I. Thus, every relational calculus expression as above defines a k-ary query. Example: The relational calculus expression {(x,y): z(e(x,z) Æ E(z,y)} returns the set P of all pairs of nodes (a,b) that are connected via a path of length 2. 6
Relational Calculus as a Database Query Language Example: FACULTY(name, dpt, salary) Find the names of the highest paid faculty in CS {x: ϕ(x)}, where ϕ(x) is the formula: y,z (FACULTY(x,y,z) Æ y = CS Æ ( u,v,w(faculty(u,v,w) Æ v = CS z w))) Exercise: Express this query in relational algebra and in SQL. Abbreviation: x 1,,x k stands for x 1,, x k x 1,,x k stands for x 1,, x k 7
Natural Join in Relational Calculus Example: Let R(A,B,C) and S(B,C,D) be two ternary relation schemas. Recall that, in relational algebra, the natural join R S is given by π R.A,R.B,R.C,S.D (σ R.B = S.B Æ R.C = S.C (R S)). Give a relational calculus expression for R S {(x 1,x 2,x 3,x 4 ): R(x 1,x 2,x 3 ) Æ S(x 2,x 3,x 4 )} Note: The natural join is expressible by a quantifier-free formula of relational calculus. 8
Quotient in Relational Calculus Recall that the quotient (or division) R S of two relations R and S is the relation of arity r s consisting of all tuples (a 1,,a r-s ) such that for every tuple (b 1,,b s ) in S, we have that (a 1,,a r-s, b 1,,b s ) is in R. Assume that R has arity five and S has arity 2. Express R S in relational calculus. {(x 1,x 2,x 3 ): ( x 4 )( x 5 ) (S(x 4,x 5 ) R(x 1,x 2,x 3,x 4,x 5 ))} Much simpler than the relational algebra expression for R S 9
The need for more formal semantics for Relational Calculus The semantics of the relational calculus expressions considered thus far have been unambiguous (and consistent with our intuition). However, consider the following relational calculus expressions: {(x 1,,x k ): R(x 1,,x k )} {(x,y): z(chair(x,z) Æ y z)}, where CHAIR(dpt,name) {x: y,z ENROLLS(x,y,z)}, with ENROLLS(s-name,course,term) Question: What is the semantics of each of these expressions? 10
The need for more formal semantics for Relational Calculus Fact: To evaluate {(x 1,,x k ): R(x 1,,x k )} we need to know what the possible values for the variables x 1,, x k are. If the variables x 1,,x k range over a domain D, then {(x 1,,x k ): R(x 1,,x k )} = D k R. Note: Intuitively, the relational calculus expression {(x 1,,x k ): R(x 1,,x k )} is not domain independent. In contrast, the relational calculus expression {(x 1,,x k ): S(x 1,..,x k ) Æ R(x 1,,x k )} is domain independent. 11
Active Domain Definition: The active domain adom(ϕ) of a relational calculus formula ϕ is the set of all constants that occur in ϕ. If ϕ is R(x,y), then adom(ϕ) = If ϕ is y(r(x,y) Æ y > 3), then adom(ϕ) = {3}. If ϕ is y(p(x,2,y) R(x,y)), then adom(ϕ) = {2}. The active domain adom(i) of a relational database instance I is the set of all values that occur in the relations of I. 12
Active Domain and Relative Interpretations Definition: Let ϕ(x 1,,x k ) be a relational calculus formula and let I be a relational database instance. If is D a domain such adom(ϕ) adom(i) D, then ϕ D (I) is the result of evaluating ϕ(x 1,,x k ) over D and I, that is, all variables and quantifiers are assumed to range over D, and the relation symbols in ϕ are interpreted by the relations in I. ϕ adom (I) is ϕ D (I), where D = adom(ϕ) adom(i). Note: adom(ϕ) adom(i) is the smallest domain on which it makes sense to evaluate ϕ. 13
Active Domain and Relative Interpretation Example: Let ϕ be R(x,y) and I = {(1,2)}. ϕ adom (I) = {(2,1), (1,1), (2,2)} If D = {1,2,3}, then ϕ D (I)= {(2,1),(1,1),(2,2),(3,3),(1,3),(3,1),(2,3),(3,2)} Note: This example shows that, in general, ϕ adom (I) ϕ D (I) 14
Domain Independence Definition: A relational calculus formula ϕ is domain independent if for every relational instance I and every domain D such that adom(ϕ) adom(i) D, we have that ϕ D (I) = ϕ adom (I). Examples: R(x 1,,x k ) is not domain independent. yr(x,y) is domain independent. (Why?) yr(x,y) is not domain independent. (Why?) P(x) Æ y(r(x,y) y> 5) is domain independent. (Why?) 15
Domain Independence Examples: The following relational calculus expressions are not domain independent {x: ( y)( z) R(x,y,z)} {(x,y): z(chair(x,z) Æ y z)}, where CHAIR(dpt,name) {x: y,z ENROLLS(x,y,z)}, where ENROLLS(s-name,course,term) 16
Equivalence of Relational Algebra and Relational Calculus Theorem: The following are equivalent for a k-ary query q: 1. There is a relational algebra expression E such that q(i) = E(I), for every database instance I (in other words, q is expressible in relational algebra). 2. There is a domain independent relational calculus formula ϕ such that q(i) = ϕ adom (I), for every database instance I (in other words, q is expressible in domain independent relational calculus). 3. There is a relational calculus formula ψ such that q(i) = ψ adom (I), for every database instance I (in other words, q is expressible in relational calculus under the active domain interpretation). 17
From Relational Algebra to Relational Calculus Theorem: For every relational expression E, there is an equivalent relational calculus expression {(x 1,,x k ): ϕ(x 1, x k )}. Proof: By induction on the construction of rel. algebra expressions. If E is a relation R of arity k, then we take {(x 1,,x k ): E(x 1,,x k )}. Assume E 1 and E 2 are expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k )} and {(x 1,,x k ): ϕ 2 (x 1,,x k )}. Then E 1 E 2 is expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k ) Ç ϕ 2 (x 1,,x k )}. E 1 E 2 is expressible by {(x 1,,x k ): ϕ 1 (x 1,,x k ) Æ ϕ 2 (x 1,,x k )}. E 1 E 2 is expressible by {(x 1,,x k,y 1,,y m ): ϕ 1 (x 1,,x k ) Æ ϕ 2 (y 1,,y m )} 18
From Relational Algebra to Relational Calculus Theorem: For every relational expression E, there is an equivalent relational calculus expression {(x 1,,x k ): ϕ(x 1, x k )}. Proof: (continued) Assume that E is expressible by {(x 1,,x k ): ϕ(x 1,,x k )}. Then π 1,3 (E) is expressible by {(x 1,x 3 ): ( x 2 )( x 4 ) ( x k ) ϕ(x 1,,x k ) } σ Θ (E) is expressible by {(x 1,,x k ): Θ* Æ ϕ(x 1,,x k )}, where Θ* is the rewriting of Θ as a formula of relational calculus. Corollary: Relational Calculus is relationally complete. 19
From Relational Algebra to Relational Calculus Example: R(A,B), S(C,D) Translate π 1,4 (σ R.B=S.C (R S)) to relational calculus 1. R translates to R(x,y) 2. S translates to S(z,w) 3. R S translates to R(x,y) Æ S(z,w) 4. σ R.B=S.C (R S) translates to (y=z) Æ R(x,y) Æ S(z,w) 5. π 1,4 (σ R.B=S.C (R S)) translates to y z ((y=z) Æ R(x,y) Æ S(z,w)) or, simply, to y (R(x,y) Æ S(y,w))
Equivalence of Relational Algebra and Relational Calculus Proof (Sketch): 1. 2. We also need to show that the resulting formula is domain independent. Show by induction that this translation of relational algebra to relational calculus is actually a translation of relational algebra to domain independent relational calculus. 2. 3. This implication is obvious. 3. 1. Show first that for every relational database schema S, there is a relational algebra expression E such that for every database instance I, we have that adom(i) = E(I). Use induction on the construction of relational calculus formulas and the above fact to obtain a translation of relational calculus under the active domain interpretation to relational algebra. 21
Equivalence of Relational Algebra and Relational Calculus In this translation, the most interesting part is the simulation of the universal quantifier in relational algebra. It uses the logical equivalence yψ y ψ As an illustration, consider yr(x,y). yr(x,y) y R(x,y) adom(i) = π 1 (R) π 2 (R) Rel.Calc. formula ϕ Relational Algebra Expression for ϕ adom R(x,y) (π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) R y R(x,y) π 1 ((π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) - R) y R(x,y) (π 1 (R) π 2 (R)) (π 1 ((π 1 (R) π 2 (R)) (π 1 (R) π 2 (R)) - R)) 22
Equivalence of Relational Algebra and Relational Calculus Remarks: The Equivalence Theorem is effective. Specifically, the proof of this theorem yields two algorithms: an algorithm for translating from relational algebra to domain independent relational calculus, and an algorithm from translating from domain independent relational calculus to relational algebra. Each of these two algorithms runs in linear time. 23
Domain Independent Relational Calculus Note: A desirable feature of a logical formalism is that there is an (efficient) algorithm for determining whether or not an expression is a formula of that formalism. Both relational algebra and relational calculus have this property. Question: Does domain independent relational calculus have this property? In other words, is there an algorithm such that, given a relational calculus formula ϕ, the algorithm tells whether or not ϕ is domain independent? 24
Domain Independent Relational Calculus Bad News Theorem (Di Paola 1969): Determining domain independence is an undecidable problem, i.e., there is no algorithm such that, given a relational calculus formula ϕ, the algorithm tells whether or not ϕ is domain independent. Some Good News: Theorem: Domain independent relational calculus has an effective syntax, i.e., there is a class F of relational calculus formulas such that: There is an (efficient) algorithm for testing membership in F. Every formula in F is domain independent. Every domain independent relational calculus formula is logically equivalent to a formula in F. 25
Domain Independent Relational Calculus For much more on domain independence: Read Sections 5.3 and 5.4 of Foundations of Databases. Read the papers The recursive unsolvability of the decision problem for the class of definite formulas by Robert A. Di Paola, JACM, Vol. 16, 1969, pages 324-327 (available at the class webpages) Safety and translation of relational calculus by Allen van Gelder and Rodney Topor, ACM Transactions on Database Systems, Vol. 16, 1991, pages 235 278 (available at the class webpages). 26
Relational Calculus and SQL Relational calculus has influenced the design of SQL. In particular, existential and universal quantification are used in two different forms in SQL. Both these forms occur in the allowable conditions in the WHERE clause of the SELECT FROM WHERE construct. In addition, sets/multisets are allowed as operands in the WHERE clause (this is what makes existential and universal quantification meaningful). 27
Sets as Operands in SQL Sets are allowed as operands in the WHERE clause. Sets are defined by listing their elements (boring feature), or as the result of a SELECT FROM WHERE construct nested inside the WHERE clause of an outer SELECT FROM.. WHERE (interesting feature) This is what makes SQL a structured language, i.e., we have queries inside queries up to any finite depth of nesting. When sets are used as operands in a comparison clause: We must use one of the keywords IN, NOT IN, SOME, ALL. SOME and ALL must be preceded by one of the of comparison operators =,,,, >, <. The use of SOME and ALL is the first form of existential and universal quantification in SQL. 28
Sets as Operands in SQL Example: FACULTY(name,dpt,salary) Find the names of faculty who are in a department in which no member earns more than $175,000. SELECT name FROM FACULTY WHERE dpt NOT IN (SELECT dpt FROM FACULTY WHERE salary > 175,000) Exercise: Express this query without using an SQL subquery. 29
SOME and ALL in SQL Syntax: In the WHERE clause, we can have have subclauses of the form <attribute name> op SOME T <attribute name> op ALL T, where op is one of the comparison operators =,,,, >, < T is the result of a nested SELECT FROM WHERE clause. Semantics: <attribute name> op SOME T means: ( x)(x T Æ <attribute name> op x) <attribute name> op ALL S means: ( x)(x T <attribute name> op x). 30
SOME and ALL in SQL Note: <attribute name> = SOME T is the same as IN <attribute name> ALL S is the same as NOT IN Note: Earlier versions of SQL used ANY in place of SOME. The use of ANY can be quite confusing and can lead to errors. Even if the system supports ANY, it is better to avoid using it. 31
SOME and ALL in SQL Example: FACULTY(name,dpt,salary) Find the highest paid faculty in CS SELECT name FROM FACULTY WHERE dpt = CS AND salary ALL (SELECT salary FROM FACULTY WHERE dpt = CS ). Question: What is the result of the following SQL query? SELECT name FROM FACULTY WHERE dpt = CS AND salary > ALL (SELECT salary FROM FACULTY WHERE dpt = CS ). 32
SOME and ALL in SQL Question: What are the results of the following two SQL queries? SELECT name FROM FACULTY WHERE dpt = CS AND salary > SOME (SELECT salary FROM FACULTY WHERE dpt = CS ). SELECT name FROM FACULTY WHERE dpt = CS AND salary SOME (SELECT salary FROM FACULTY WHERE dpt = CS ). Answer: The first returns all CS faculty who are not the lowest paid ones. The second returns all CS faculty. 33
EXISTS and NOT EXISTS in SQL Syntax: SELECT FROM WHERE EXISTS (SELECT FROM WHERE) Semantics: The subquery (SELECT FROM WHERE) is evaluated and the resulting set is tested for emptiness: If it is non-empty, then the condition in WHERE evaluates to true ; otherwise, it evaluates to false. Syntax: SELECT FROM WHERE NOT EXISTS (SELECT FROM WHERE) Semantics: The subquery (SELECT FROM WHERE) is evaluated and the resulting set is tested for emptiness: If it is empty, then the condition in WHERE evaluates to true ; otherwise, it evaluates to false. 34
EXISTS and NOT EXISTS in SQL Example: FACULTY(name,dpt,salary) Find the faculty in the CS dpt who are not the lowest paid ones. SELECT R.name FROM FACULTY as R WHERE dpt = CS AND EXISTS (SELECT * FROM FACULTY AS T WHERE R.dpt = CS AND R.salary > T.salary) Note: This is an example of a correlated subquery: The subquery has to be evaluated separately for each tuple in the FROM list of the outer query. The tuple is kept or removed depending on the result of the EXISTS test. 35
EXISTS and NOT EXISTS in SQL Example: FACULTY(name,dpt,salary) Find the highest paid faculty in CS. SELECT R.name FROM FACULTY as R WHERE dpt = CS AND NOT EXISTS (SELECT * FROM FACULTY AS T WHERE R.dpt = CS AND R.salary < T.salary) Note: SQL queries with SOME and ALL can be transformed to SQL queries with EXISTS and NOT EXISTS. SOME, ALL, EXISTS, NOT EXISTS are imported directly from relational calculus. 36