Other Query Languages Winter 2006-2007 Lecture 11
Query Languages Previously covered relational algebra A procedural query language Foundation of the Structured Query Language (SQL) Supports wide variety of queries, inserts, updates, deletes Two other formal query languages: Tuple relational calculus Domain relational calculus Two language implementations based on relational calculus Nowhere near the popularity of SQL, though
Tuple Relational Calculus Entirely declarative! Queries have the form: { t P(t) } The set of all tuples t, such that P(t) is true As before, t[a] refers to t s value for attribute A Example query: Find the branch name, loan number, and amount of all loans over $1200. { t t loan t[amount] > 1200 } The set of all tuples t, such that t appears in the loan relation, and t s amount is over $1200. Schema has same attributes that loan relation has.
Selecting Loan Numbers If we only want loan numbers: Schema of t should be (loan_number) But, now t doesn t appear in loan relation Introduce another tuple variable: s loan Updated query: { t s loan ( t[loan_number] = s[loan_number] s[amount] > 1200 ) } The set of all tuples t, such that there exists a tuple s in the loan relation with the same loan number as t, and a balance of more than $1200. No relation associated with t, and only loan_number attribute is referenced.
Joining Relations Another query: Find names of all customers with a loan from the Perryridge branch. Relations: loan(loan_number, branch_name, amount) borrower(customer_name, loan_number) Need to join relations together to compute the result Join loan and borrower using loan_number Use tuple variables to specify join constraints
Joining Relations (2) Query: { t s borrower ( t[customer_name] = s[customer_name] u loan ( u[loan_number] = s[loan_number] u[branch_name] = Perryridge )) } The set of all tuples t, such that t refers to a valid borrower name, and the borrower has at least one loan at the Perryridge branch. Use var relation to perform join and subquery kinds of operations Tuple variables s, u allow the borrower and loan relations to be correlated with each other, and with t
Set Operations Set operations are easy in the tuple relational calculus Example: Find all customers that have an account or a loan. In relational algebra, used set-union operation: ( In tuple relational calculus: { t s borrower ( t[customer_name] = s[customer_name] ) u depositor ( t[customer_name] = u[customer_name] ) } This defines a relation, so if customer is in borrower and depositor, they only appear once in result.
Set Operations (2) Find customers with both an account and a loan. Just change to : { t s borrower ( t[customer_name] = s[customer_name] ) u depositor ( t[customer_name] = u[customer_name] ) } Find customers with an account but not a loan. Again, very simple to construct: { t s depositor (t[customer_name] = s[customer_name]) Ÿ u borrower (t[customer_name] = u[customer_name]) } Result contains customer names that appear in depositor relation, but not in borrower relation
For All Queries Another query: Find all customers who have an account at all branches located in Brooklyn. In relational algebra, used divide operation In tuple relational calculus, can use for this t r (Q(t)) Q is true for all tuples t in relation r. Can also use implication: P Q means Q ŸP
For All Queries (2) Query: { t r customer ( r[customer_name] = t[customer_name] ) u branch ( u[branch_city] = Brooklyn s depositor ( t[customer_name] = s[customer_name] w account ( w[account_number] = s[account_number] w[branch_name] = u[branch_name] )))) } The set of all customer names, such that for all branches located in Brooklyn, there exists a tuple in depositor for that customer with a corresponding account located at that branch. What if there are no branches in Brooklyn? Must restrict result to only valid customer names
Formal Definitions Expressions are of the form { t P(t) } P is some formula Can contain tuple variables A tuple variable is said to be free if not quantified by or Tuple variables constrained by or are bound Example: t loan s branch ( t[branch_name] = s[branch_name] ) t is a free tuple variable s is a bound tuple variable
Formal Definitions (2) Formulas are built out of atoms Valid atoms: s r ( not allowed) s[x] Θ u[y] Comparison of two attributes Θ is a comparison operator: < = > s[x] Θ c Comparison to some constant value Any atom is also a formula
Formal Definitions (3) Compositions of atoms: If P 1 is a formula, then so are ŸP 1 and (P 1 ) If P 1 and P 2 are formulae, then so are: P 1 P 2 P 1 P 2 P 1 P 2 If P 1 (s) is a formula containing a free tuple variable s, and r is a relation, then so are: s r (P 1 (s)) s r (P 1 (s))
Safety of Expressions Does tuple relational calculus have any issues? Yes: Can write an expression that generates an infinite relation! { t Ÿ(t loan) } Result contains an infinite set of tuples Such expressions are said to be unsafe Must constrain ourselves to safe expressions Define rules to indicate what expressions are safe
Safety of Expressions (2) Every tuple relational formula P has a domain The set of all values referenced by P Includes values explicitly stated in P Also includes values in relations used in P Denoted as dom(p) Example: dom(t loan t[amount] > 1200) Set containing 1200, and all values appearing in loan An expression { t P(t) } is safe if all values in result are also in dom(p) { t Ÿ(t loan) } is not safe because it contains values outside of loan relation
Tuple Relational Calculus Same expressive power as relational algebra, when limited to safe expressions Doesn t have extended relational operators Purely declarative nature makes it more appealing, in some ways Can unsafe expressions be written in relational algebra? No! Operations generate results only from tuples in existing relations.
Datalog Datalog is a query language based on relational calculus Based on logic-programming language Prolog Syntax is very similar to Prolog Datalog is purely declarative Programs consist of a set of rules Order is irrelevant Specific constraints are enforced to ensure only safe expressions Largely constrained to research project usage
Example Datalog Statements A rule to produce account numbers and balances for accounts at Perryridge branch with a balance over $700 v1(a, B) :- account(a, Perryridge, B), B > 700 account(account_number, branch_name, balance) Can be read as: for all A, B if (A, Perryridge, B) account and B > 700 then (A, B) v1 Defines a view relation v1 is a view that references account relation
Example Datalog Statements (2) Can issue queries against v1 To find balance of account A-217:? v1( A-217, B) Produces ( A-217, 750) To find accounts with a balance over $800? v1(a, B), B > 800 for all A, B if (A, B) v1 and B > 800 then (A, B) is in the result Produces one result: ( A-201, 900) Variables and values are positional Attribute names are omitted
Example Datalog Statements (3) Often need multiple rules to define view relation Example: Define a view relation that specifies the interest rate for every account. interest_rate(a, 5) :- account(a, N, B), B < 10000 interest_rate(a, 6) :- account(a, N, B), B >= 10000 Accounts with a balance of less than $10,000 have a 5% interest rate Accounts with a balance of $10,000 or more have an interest rate of 6%
Relational Operations in Datalog Cartesian product of r 1 and r 2 : query(x 1,, X n, Y 1,, Y m ) :- r 1 (X 1,, X n ), r 2 (Y 1, Y m ) Set union of r 1 and r 2 : query(x 1,, X n ) :- r 1 (X 1,, X n ) query(x 1,, X n ) :- r 2 (X 1,, X n ) Set difference of r 1 and r 2 : query(x 1,, X n ) :- r 1 (X 1,, X n ), not r 2 (X 1,, X n )
Recursive Datalog Datalog also has recursive query capabilities Powerful mechanism for navigating hierarchical data Example: Define a view relation for all employees directly or indirectly managed by a particular manager. empl(x, Y) :- manager(x, Y) empl(x, Y) :- manager(x, Z), empl(z, Y) To find all employees managed by Jones: empl(x, Jones )
Datalog Summary Datalog provides very succinct statement of queries Declarative nature facilitates optimized execution Features like recursive queries are very powerful Similar concept being incorporated into SQL Unfortunately relegated primarily to research use