CMPS 277 Principles of Database Systems. https://courses.soe.ucsc.edu/courses/cmps277/fall11/01. Lecture #3

Similar documents
CMPS 277 Principles of Database Systems. Lecture #4

Schema Mappings and Data Exchange

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Lecture 1: Conjunctive Queries

Foundations of Databases

Relational Algebra 1

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Lecture 5: Predicate Calculus. ffl Predicate Logic ffl The Language ffl Semantics: Structures

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

2.2.2.Relational Database concept

Relational Calculus. Lecture 4B Kathleen Durant Northeastern University

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

CS 377 Database Systems

Relational Calculus. Chapter Comp 521 Files and Databases Fall

Relational Algebra. Procedural language Six basic operators

Relational Algebra and Calculus. Relational Query Languages. Formal Relational Query Languages. Yanlei Diao UMass Amherst Feb 1, 2007

Operations of Relational Algebra

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

CSC 261/461 Database Systems Lecture 13. Fall 2017

Institute of Southern Punjab, Multan

Chapter 6 Formal Relational Query Languages

Chapter 6: Formal Relational Query Languages

Relational Calculus. Chapter Comp 521 Files and Databases Fall

3. Relational Data Model 3.5 The Tuple Relational Calculus

Relational Algebra for sets Introduction to relational algebra for bags

Chapter 8: The Relational Algebra and The Relational Calculus

A Retrospective on Datalog 1.0

Relational Databases

1.3 Primitive Recursive Predicates and Bounded Minimalization

Agenda. Database Systems. Session 5 Main Theme. Relational Algebra, Relational Calculus, and SQL. Dr. Jean-Claude Franchitti

Chapter 6 The Relational Algebra and Relational Calculus

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Relational db: the origins. Relational Calculus (aka FO)

Towards a Logical Reconstruction of Relational Database Theory

Relational Algebra and Calculus

CMPS 277 Principles of Database Systems. Lecture #11

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

15-819M: Data, Code, Decisions

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Formal Specification: Z Notation. CITS5501 Software Testing and Quality Assurance

7. Relational Calculus (Part I) 7.1 Introduction

Relational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property

Informationslogistik Unit 4: The Relational Algebra

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Chapter 2: Intro to Relational Model

Uncertainty in Databases. Lecture 2: Essential Database Foundations

DATABASE DESIGN II - 1DL400

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Chapter 6 5/2/2008. Chapter Outline. Database State for COMPANY. The Relational Algebra and Calculus

Composing Schema Mapping

CMP-3440 Database Systems

Relational Algebra and SQL. Basic Operations Algebra of Bags

Introductory logic and sets for Computer scientists

Databases Relational algebra Lectures for mathematics students

Query formalisms for relational model relational calculus

Relational Algebra 1

Relational Algebra. Relational Query Languages

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 2: Intro to Relational Model

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Relational Algebra 1. Week 4

CS233:HACD Introduction to Relational Databases Notes for Section 4: Relational Algebra, Principles and Part I 1. Cover slide

Notes. Notes. Introduction. Notes. Propositional Functions. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry.

Recap Datalog Datalog Syntax Datalog Semantics. Logic: Datalog. CPSC 322 Logic 6. Textbook Logic: Datalog CPSC 322 Logic 6, Slide 1

Relational Model and Relational Algebra

Foundations of Schema Mapping Management

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

Relational Query Languages

Going beyond propositional logic

Propositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

Circuit analysis summary

Discrete Mathematics Lecture 4. Harper Langston New York University

Relational Query Languages: Relational Algebra. Juliana Freire

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Relational Algebra. Algebra of Bags

Introduction to Databases. Maurizio Lenzerini

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

RELATION AND RELATIONAL OPERATIONS

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

The Inverse of a Schema Mapping

8. Relational Calculus (Part II)

CS317 File and Database Systems

Chapter 1.3 Quantifiers, Predicates, and Validity. Reading: 1.3 Next Class: 1.4. Motivation

Part I: Structured Data

Automated Reasoning. Natural Deduction in First-Order Logic

CS2300: File Structures and Introduction to Database Systems

Uncertain Data Models

CompSci 516: Database Systems

Relational Model, Relational Algebra, and SQL

Chapter 6 The Relational Algebra and Calculus

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Transcription:

CMPS 277 Principles of Database Systems https://courses.soe.ucsc.edu/courses/cmps277/fall11/01 Lecture #3 1

Summary of Lectures #1 and #2 Codd s Relational Model based on the concept of a relation (table) Relational algebra as a database query language Five basic operators (,,, π, σ) Relational algebra expressions obtained by combining these operators: can express intersection, join, quotient, Independence of the basic operators Notion of a relationally complete database query language Influence of relational algebra on SQL SELECT FROM WHERE is a combination of π, σ, SQL is relationally complete (have to use aliases for self-joins) Set semantics vs. bag (multiset) semantics 2

Roadmap for Lectures #3 and #4 Relational Calculus as a database query language Motivation Syntax Informal semantics, illustrations of its expressive power More formal semantics, making the domain explicit, the issue of domain independence. Codd s Theorem: Equivalence of relational algebra and relational calculus Algorithmic aspects of domain independence SQL Revisited: Influence of relational calculus on SQL Features of SQL beyond relational algebra and relational calculus. 3

Relational Calculus In addition to relational algebra, Codd introduced relational calculus. Relational calculus is a declarative database query language based on first-order logic. Codd s main technical result is that relational algebra and relational calculus have essentially the same expressive power. More precisely, relational algebra has the same expressive power as domain independent relational calculus. Relational calculus comes into two different flavors: Tuple relational calculus Domain relational calculus. (this is the version studied by Codd) Here, we will focus on domain relational calculus (which is the version considered in the AHV textbook). Refer to Codd s 1972 paper for the precise definition of tuple relational calculus. 4

Propositional Logic (aka Boolean Logic) Reminder Propositional variables: x, y, z, They take values 0 (True) and 1 (False). Propositional connectives: Æ, Ç,, Propositional formulas: expressions built from propositional variables and propositional connectives Syntax: ϕ := x, y, z, (ψ Æ χ) (ψ Ç χ) ψ (ψ χ) Semantics: Truth-table semantics Application: Propositional formulas express Boolean functions (x Ç y) Æ ( x Ç y) XOR-Gate (x Æ y) Ç (x Æ z) Ç (y Æ z) Majority Gate In fact, every Boolean function can be expressed by a propositional formula. 5

First-Order Logic - Motivation First-Order Logic is a formalism for expressing properties of mathematical structures (graphs, trees, partial orders, ). Example: Consider a graph G=(V,E) (nodes are in V, edges are in E) There is a self-loop. Every two nodes are connected via a path of length 2. Every node has exactly three distinct neighbors. There is a path of length 3 from node x to node y. Node x has at least four distinct neighbors These and many other similar properties are expressible as formulas of first-order logic on graphs. One of Codd s key insights was that first-order logic can also be used to express relational database queries. 6

First-Order Logic Question: What is First-Order Logic? Answer: Informally, First-Order Logic = Propositional Logic + ( and ), where and range over possible values occurring in relations. 7

Relational Calculus (First-Order Logic for Databases) First-order variables: x, y, z,, x 1,,x k, They range over values that may occur in tables. Relation symbols: R, S, T, of specified arities (names of relations) Atomic (Basic) Formulas: R(x 1,,x k ), where R is a k-ary relation symbol (alternatively, (x 1,,x k ) R; the variables need not be distinct) (x op y), where op is one of =,, <, >,, (x op c), where c is a constant and op is one of =,, <, >,,. Relational Calculus Formulas: Every atomic formula is a relational calculus formula. If ϕ and ψ are relational calculus formulas, then so are: (ϕ Æ ψ), (ϕ Ç ψ), ψ, (ϕ ψ) (propositional connectives) ( x ϕ) (existential quantification) ( x ϕ) (universal quantification). 8

Relational Calculus Examples: Assume E is a binary relation symbol ( x)e(x,x) ( x)( y)( z)(e(x,z) Æ E(z,y)) ( z 1 )( z 2 )(E(x,z 1 ) Æ E(z 1,z 2 ) Æ E(z 2,y)) ( y)( z)(e(x,y) Æ E(x,z) Æ (y z)) Free and bound variables: In the first two formulas above, no variable is free. In the third formula above, the free variables are x and y. In the fourth formula above, the only free variable is x. Intuitively, a variable is free in a formula if the variable must be assigned a value in order to tell if the formula is true or false. 9

Free and Bound Variables A sentence is a first-order formula ψ with no free variables. ( x)e(x,x) ( x)( y)( z)(e(x,z) Æ E(z,y)) ( x)( y)(x < y ( z)(x < z Æ z < y)) On every relational database I, a sentence is either true or false. Either I ψ or I ψ If a first-order formula has at least one free variable, then it makes no sense to tell whether it is true or false on a relational database I. Instead, we need to also assign values to its free variables I, 3, 5 z (x < z Æ z < y) I, 3, 4 z (x < z Æ z < y), where I is the linear order < on the natural numbers 1, 2, 3, 10

Queries Definition: Let S be a relational database schema. A k-ary query on S is a function q defined on the relational database instances over S such that if I is a relational database instance over S, then q(i) is a k-ary relation (i.e., a set of k-tuples). Note: All queries that we have expressed in relational algebra and/or in SQL thus far are queries in the above formal sense. Find the salaries of department chairs (binary query) Find the students who are enrolled in every course taught by Victor Vianu (unary query) The natural join R S of R(A,B,C) and S(B,C,D) (4-ary query) 11

Queries: Syntax vs. Semantics Queries are, by definition, semantic objects (functions) Database query languages provide a syntax for defining (expressing) queries. In particular, Every relational algebra expression defines a query Every SQL expression defines a query One of Codd s key insight was that formulas of first-order logic (aka formulas of relational calculus) can also be used to define queries. 12

Relational Calculus as a Database Query Language Definition: A relational calculus expression is an expression of the form { (x 1,,x k ): ϕ(x 1, x k ) }, where ϕ(x 1,,x k ) is a relational calculus formula with x 1,,x k as its free variables. When applied to a relational database I, this relational calculus expression returns the k-ary relation that consists of all k-tuples (a 1,,a k ) that make the formula true on I. Thus, every relational calculus expression as above defines a k-ary query. Example: The relational calculus expression { (x,y): ( z)(e(x,z) Æ E(z,y) } returns the set P of all pairs of nodes (a,b) that are connected via a path of length 2. 13

Relational Calculus as a Database Query Language Example: FACULTY(name, dpt, salary), CHAIR(dpt, name) Give a relational calculus expression for C-SALARY(dpt,salary) (find the salaries of department chairs). { (x,y): u(faculty(u,x,y) Æ CHAIR(x,u)) } Here is another relational calculus expression for the same task: { (x,y): u v(faculty(u,x,y) Æ CHAIR(x,v) Æ (u=v)) } 14

Relational Calculus as a Database Query Language Example: FACULTY(name, dpt, salary) Find the names of the highest paid faculty in CS { x: ϕ(x) }, where ϕ(x) is the formula: y,z (FACULTY(x,y,z) Æ y = CS Æ ( u,v,w(faculty(u,v,w) Æ v = CS z w))) Exercise: Express this query in relational algebra and in SQL. Abbreviation: x 1,,x k stands for x 1,, x k x 1,,x k stands for x 1,, x k 15

Natural Join in Relational Calculus Example: Let R(A,B,C) and S(B,C,D) be two ternary relation schemas. Recall that, in relational algebra, the natural join R S is given by π R.A,R.B,R.C,S.D (σ R.B = S.B Æ R.C = S.C (R S)). Give a relational calculus expression for R S { (x 1,x 2,x 3,x 4 ): R(x 1,x 2,x 3 ) Æ S(x 2,x 3,x 4 ) } Note: The natural join is expressible by a quantifier-free formula of relational calculus. 16

Quotient in Relational Calculus Recall that the quotient (or division) R S of two relations R and S is the relation of arity r s consisting of all tuples (a 1,,a r-s ) such that for every tuple (b 1,,b s ) in S, we have that (a 1,,a r-s, b 1,,b s ) is in R. Assume that R has arity 5 and S has arity 2. Express R S in relational calculus (3-ary query) { (x 1,x 2,x 3 ): ( x 4 )( x 5 ) (S(x 4,x 5 ) R(x 1,x 2,x 3,x 4,x 5 )) } Much simpler than the relational algebra expression for R S 17

The need for more formal semantics for Relational Calculus The semantics of the relational calculus expressions considered thus far have been unambiguous (and consistent with our intuition). However, consider the following relational calculus expressions: { (x 1,,x k ): R(x 1,,x k ) } { (x,y): z(chair(x,z) Æ y z) }, where CHAIR(dpt,name) { x: y,z ENROLLS(x,y,z) }, with ENROLLS(s-name,course,term) Question: What is the semantics of each of these expressions? 18

The need for more formal semantics for Relational Calculus Fact: To evaluate { (x 1,,x k ): R(x 1,,x k ) } we need to know what the possible values for the variables x 1,, x k are. If the variables x 1,,x k range over a domain D, then { (x 1,,x k ): R(x 1,,x k ) } = D k R. Note: Intuitively, the relational calculus expression { (x 1,,x k ): R(x 1,,x k ) } is not domain independent. In contrast, the relational calculus expression { (x 1,,x k ): S(x 1,..,x k ) Æ R(x 1,,x k ) } is domain independent. 19

The need for more formal semantics for Relational Calculus Note: The three relational calculus expressions { (x 1,,x k ): R(x 1,,x k )} { (x,y): z(chair(x,z) Æ y z) }, where CHAIR(dpt,name) { x: y,z ENROLLS(x,y,z) }, with ENROLLS(s-name,course,term) produce different answers when we consider different domains over which the variables are interpreted. Fact: None of these three expressions is domain independent. 20

The need for more formal semantics of Relational Calculus Conclusion: To give rigorous and unambiguous semantics to relational calculus expressions, we need to make explicit the domain over which the variables (and the quantifiers) take values. We also need to formalize the notion of domain independence. Note: In mathematical logic, the domain of the possible values of the variables is always given explicitly (it is called the universe of the structure on which the formula is evaluated). Question: Is there a natural choice for the domain over which the variables take value? 21

Active Domain Definition: The active domain adom(ϕ) of a relational calculus formula ϕ is the set of all constants that occur in ϕ. If ϕ is R(x,y), then adom(ϕ) = If ϕ is y(r(x,y) Æ y > 3), then adom(ϕ) = {3}. If ϕ is y(p(x,2,y) R(x,y)), then adom(ϕ) = {2}. The active domain adom(i) of a relational database instance I is the set of all values that occur in the relations of I. 22

Active Domain and Relative Interpretations Definition: Let ϕ(x 1,,x k ) be a relational calculus formula and let I be a relational database instance. If is D a domain such adom(ϕ) adom(i) D, then ϕ D (I) is the result of evaluating ϕ(x 1,,x k ) over D and I, that is, all variables and quantifiers are assumed to range over D, and the relation symbols in ϕ are interpreted by the relations in I. ϕ adom (I) is ϕ D (I), where D = adom(ϕ) adom(i). Note: adom(ϕ) adom(i) is the smallest domain on which it makes sense to evaluate ϕ. 23

Active Domain and Relative Interpretation Example: Let ϕ be R(x,y) and I = {(1,2)}. ϕ adom (I) = {(2,1), (1,1), (2,2)} If D = {1,2,3}, then ϕ D (I)= {(2,1),(1,1),(2,2),(3,3),(1,3),(3,1),(2,3),(3,2)} Note: This example shows that, in general, ϕ adom (I) ϕ D (I) 24

Active Domain and Relative Interpretation Example: Let ϕ be yr(x,y) and I = {(1,2), (1,1)}. adom(ϕ) adom(i) = {1,2} ϕ adom (I) = {1} If D = {1,2,3}, then ϕ D (I)= (1 is not in ϕ D (I), because (1,3) is not in I) Note: This example also shows that, in general, ϕ adom (I) ϕ D (I) 25

Domain Independence Definition: A relational calculus formula ϕ is domain independent if for every relational instance I and every domain D such that adom(ϕ) adom(i) D, we have that ϕ D (I) = ϕ adom (I). Examples: R(x 1,,x k ) is not domain independent (see slide 24). yr(x,y) is domain independent. (Why?) yr(x,y) is not domain independent (see slide 25). P(x) Æ y(r(x,y) y > 5) is domain independent. (Why?) 26

Domain Independence Examples: The following relational calculus expressions are not domain independent {x: ( y)( z) R(x,y,z)} {(x,y): z(chair(x,z) Æ y z)}, where CHAIR(dpt,name) {x: y,z ENROLLS(x,y,z)}, where ENROLLS(s-name,course,term) 27

Equivalence of Relational Algebra and Relational Calculus Theorem: The following are equivalent for a k-ary query q: 1. There is a relational algebra expression E such that q(i) = E(I), for every database instance I (in other words, q is expressible in relational algebra). 2. There is a domain independent relational calculus formula ϕ such that q(i) = ϕ adom (I), for every database instance I (in other words, q is expressible in domain independent relational calculus). 3. There is a relational calculus formula ψ such that q(i) = ψ adom (I), for every database instance I (in other words, q is expressible in relational calculus under the active domain interpretation). 28