DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala University, Uppsala, Sweden 02/12/16 1
Introduction to relational calculus Kjell Orsborn Department of Information Technology Uppsala University, Uppsala, Sweden 02/12/16 2
Relational calculus (RC) Formal query languages based on predicate calculus (mathematical logic) Query languages based on relational calculus: SQL Query By Example, QBE (Microsoft Access) Datalog AmosQL Query languages based on RC are non-procedural, since a formula in RC specifies what to be retrieved rather than how. The term declarative is also used The relational algebra (RA) is procedural because the order in which operators are executed is explicitly specified 02/12/16 3
Relational calculus (RC) The non-procedural semantics means that it is up to the query optimizer to decide what strategy (algorithms) to use when searching the database The query language user specifies only constraints on the result The query optimizer chooses the execution strategy The expressability of RC and RA are the same Everything that can be expressed in RA can also be expressed in RC and vice versa A relationally complete query language has at least the same expressability as RA or RC SQL has more expressability than RC (i.e SQL RC). For example, SQL can handle aggregate functions, grouping, sorting, duplicates, which is beyound being relationally complete. Relational completeness defines a real query language 02/12/16 4
Two kinds of relational calculi Tuple calculus Predicate calculus where variables are bound to rows in tables The basis for the relational data model and SQL Domain calculus Predicate calculus where variables are bound to atomic values (literals) Query-by-Example A graphical query language in Microsoft Access Datalog Domain calculus with implicit quantifiers AmosQL Domain calculus with user-defined functions and types SPARQL Semantic web query language 02/12/16 5
Tuple relational calculus (TRC) SQL is a generalized and syntactically sugared TRC In TRC a variable t must be bound to a row in a relation (table) The attribute A of a tuple t is denoted t.a In Elmasri-Navathe textbook the notation t[a] is used. Queries in TRC have the format: {t 1.A 1, t 2.A 2,... COND(t 1,t 2,...)} where t i are variables bound to entire tuples in tables. They are free variables in the condition COND. A i are attribute names 02/12/16 6
Example 1 Relations: employee(ssn, fname, lname, salary, bdate, dno) department(dno, dname) Query: Find the last names of the persons earning more than 50000 SQL: SELECT t.lname FROM employee t WHERE t.salary > 50000 TRC: {t.lname employee(t) t.salary >50000} 02/12/16 7
Example 2 Query: Find the birth dates and last names of the persons earning more that 50000 and whose first name is Oskar SQL: SELECT t.bdate, t.lname FROM employee t WHERE t.salary > 50000 AND t.fname= Oskar TRC: {t.bdate, t.lname employee(t) t.salary >50000 t.fname= Oskar } 02/12/16 8
Example 3 (join query) Query: Find the last names and departments of the persons earning more than 50000 SQL: SELECT t.lname, d.dname FROM employee t, department d WHERE t.salary > 50000 AND d.dno = t.dno TRC: {t.lname, d.dname employee(t) department(d) d.dno = t.dno t.salary > 50000} 02/12/16 9
A TRC query has the format Definition of TRC atoms {t 1.A 1, t 2.A 2... COND(t 1,t 2,...)} Where COND is a TRC Well Formed Formula (WFF) consisting of atoms that can be one of the following: 1. An atom on format r(t i ) called a range predicate, which defines tuple variable t i bindings, e.g. employee(t) 2. An atom on format t i.a op C, where op is one of =,, <,, >,, and C is a constant, e.g. t.salary > 5000. 3. An atom on format t i.a op t j.b, e.g. t.dno = d.dno. 02/12/16 10
Definition of TRC formula A TRC formula is either an atom or several formulae combined using the logical operators (and), (or) and (not). Thus a formula is one of the following: 4. An atom as above 5. If F 1 and F 2 are formulae then F 1 F 2, F 1 F 2, and F 1 are also formulae. E.g. d.dname = TOYS d.dno = t.dno A formula can also be annotated with one of the quantifiers (for all) and (there exists). Thus: 6. If F(t) is a formula then (( t)f(t)) is also a formula. E.g. ( d) department(d) d.dname = TOYS d.dno = t.dno (formula used in complete query on next slide) The formula (( t)f(t)) is true for some tuple t in F(t). 02/12/16 11
Example exists quantifier Query: Find the names in those employees in the TOYS department earning more than 50000. SQL: select t.fname, t.lname from employee t, department d where t.salary > 50000 and d.name = TOYS and d.dno = t.dno TRC: {t.fname, t.lname employee(t) t.salary > 50000 (( d) department(d) d.dname = TOYS d.dno = t.dno)} Notice that if a tuple variable in SQL is not referenced in the result then it is automatically quantified. 02/12/16 12
For all quantifier 7. If F(t) is a formula then (( t)f(t)) is also a formula. The formula (( t)f(t)) is true if F(t) is true for all tuples t in F. Forall quantification can be rewritten using not exists ( ), since: (( t)f(t)) ( ( t) F(t)) Thus for relational completeness is not needed. How would the following query be formulated: Query: List those departments where all employees earn more than 10000. Hint: formulate the query with not exists. Notice that closed world assumption is used in query languages: => Everything not in the database is considered false. 02/12/16 13
Example - for all quantifier So how is universal quantification expressed in TRC (and SQL)? For example, suppose we want to query: List those departments where all employees earn more than 10000. The simplest way to handle universal quantification is to instead transform the query into the corresponding negative query: List those departments where no employees earns less than 10000. This query can be formulated using in tuple relational calculus as: {d. dname (department(d) (( t) (employee(t) t.dno = d.dno t.salary < 10000)} 02/12/16 14
Evaluation of TRC For example Query 3: {t.lname, d.dname employee(t) department(d) d.dno = t.dno t.salary > 50000)} 1. Form the Cartesian product of tuples in all range predicates: employee(t) x department(d) 2. Select those tuples in the Cartesian product that fulfill the rest of the condition: d.dno = t.dno t.salary > 50000 3. Project the result attributes: t.lname, d.dname 02/12/16 15
Domain relational calculus (DRC) DRC is a relational calculus where the variables are bound to atomic values, rather than tuples as in TRC. Queries in DRC have the format: {x 1, x 2... COND(x 1,x 2,...)} The result is the set of tuples (x 1, x 2...) for which the condition COND(x 1,x 2,...) holds. Query 3 in DRC: {lname, dname employee(ssn, fname, lname, salary, bdate, dno) department(dno, dname) salary > 50000} 02/12/16 16
TRC vs DRC TRC queries remain the same if new attributes are added Data independent queries DRC queries must be changed when new attributes are added Data dependent queries TRC queries much simpler than DRC queries when there are many attribute. For example, what would happen with query 3 if we hade 20 attributes in employee and department relations? TRC provides data independence in queries as database schema evolves 02/12/16 17
AmosQL semantics Query 3 in AmosQL: select lname(t), dname(t) from Employee t, Department d where salary(t) > 50000 and dept(t) = d 1. Variables are bound to objects from types in from clause, i.e. AmosQL is a domain calculus query language. Notice: user-defined types and function overloading provides data independence 2. Attributes are accessed by functions Notice: functions can have > 1 argument too 3. Foreign keys usually represented as functions mapping from one type to another, e.g. dept(t) = d 4. The from clause may reference types with extents, e.g. numbers 02/12/16 18
AmosQL semantics Every type t has an extent(t) being the set of objects of type t. Notice: types can be inherited => extents are unioned. Evaluation: 1. Form the Cartesian product of objects in all extents in from clause. 2. Choose the objects in the Cartesian product that fulfill the condition. 3. Project the selected objects Complication: extents may be infinite, containing objects. => The set of objects selected by 1. followed by constraints specified by 2. must be finite, otherwise the query is unexecutable (unsafe). 02/12/16 19