Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Similar documents
Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Outline. Database Theory. Prerequisites and Admission. Classes VU , SS 2018

Databases Lectures 1 and 2

2.2.2.Relational Database concept

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Relational Databases

Complexity Theory VU , SS General Information. Reinhard Pichler

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

CS 377 Database Systems

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Lecture 1: Conjunctive Queries

Informationslogistik Unit 4: The Relational Algebra

Relational Model and Relational Algebra

Relational Algebra 1

Relational Database: The Relational Data Model; Operations on Database Relations

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

CMPS 277 Principles of Database Systems. Lecture #3

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Ian Kenny. November 28, 2017

The Relational Model

THE RELATIONAL MODEL. University of Waterloo

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Relational Algebra 1

Chapter 6 The Relational Algebra and Calculus

Relational Algebra for sets Introduction to relational algebra for bags

Relational Model: History

Database Theory VU , SS Introduction to Datalog. Reinhard Pichler. Institute of Logic and Computation DBAI Group TU Wien

1. Introduction; Selection, Projection. 2. Cartesian Product, Join. 5. Formal Definitions, A Bit of Theory

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Model, Relational Algebra, and SQL

A Retrospective on Datalog 1.0

Schema Mappings and Data Exchange

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

Foundations of Databases

DATABASE DESIGN II - 1DL400

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

CSC 501 Semantics of Programming Languages

Databases-1 Lecture-01. Introduction, Relational Algebra

Relational Model 2: Relational Algebra

An Evolution of Mathematical Tools

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Handout 9: Imperative Programs and State

Relational Algebra. Relational Query Languages

Databases 2011 Lectures 01 03

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra

Introduction to Data Management. Lecture #11 (Relational Algebra)

Relational Metadata Integration. Cathy Wyss

Relational Algebra and Calculus

Announcements. CSCI 334: Principles of Programming Languages. Exam Study Session: Monday, May pm TBL 202. Lecture 22: Domain Specific Languages

CMP-3440 Database Systems

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Biomedicine and bioinformatics databases (module Fundamentals of database systems)

Three Query Language Formalisms

1 Relational Data Model

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Chapter 6 Part I The Relational Algebra and Calculus

Structural characterizations of schema mapping languages

CSCC43H: Introduction to Databases. Lecture 3

CMPS 277 Principles of Database Systems. Lecture #4

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Chapter 6 5/2/2008. Chapter Outline. Database State for COMPANY. The Relational Algebra and Calculus

Towards a Logical Reconstruction of Relational Database Theory

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

Finite Model Generation for Isabelle/HOL Using a SAT Solver

Chapter 3: Relational Model

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

Denotational Semantics. Domain Theory

CMPS 277 Principles of Database Systems. Lecture #11

Introduction to Finite Model Theory. Jan Van den Bussche Universiteit Hasselt

3. Relational Data Model 3.5 The Tuple Relational Calculus

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

Relational Model and Relational Algebra. Rose-Hulman Institute of Technology Curt Clifton

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 1. Textbook. Practicalities: assessment. Aims and objectives of the course

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Database Applications (15-415)

Information Systems (Informationssysteme)

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Relational Database Systems 1

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Chapter 6 The Relational Algebra and Relational Calculus

Relational Algebra and Calculus. Relational Query Languages. Formal Relational Query Languages. Yanlei Diao UMass Amherst Feb 1, 2007

LTCS Report. Concept Descriptions with Set Constraints and Cardinality Constraints. Franz Baader. LTCS-Report 17-02

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Three Query Language Formalisms

Relational Query Languages: Relational Algebra. Juliana Freire

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Foundations of SPARQL Query Optimization

Relational Database Systems 2 5. Query Processing

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Transcription:

Database Theory Database Theory VU 181.140, SS 2018 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 6 March, 2018 Pichler 6 March, 2018 Page 1

Database Theory Outline 1. Overview 1.1 Databases and Query Languages 1.2 Query Languages: Relational Algebra 1.3 Query Languages: Relational (Domain) Calculus 1.4 Query Languages: SQL 1.5 Query Languages: other Languages 1.6 Some Fundamental Aspects of Query Languages Pichler 6 March, 2018 Page 2

Database Theory 1. Overview 1.1. Databases and Query Languages A short history of databases 1970 s: relational model of databases (E. F. Codd) relational query languages (SQL) 1980 s: relational query optimization constraints, dependency theory datalog (extend the query language with recursion) 1990 s: new models: temporal databases, OO, OR databases data mining, data warehousing 2000 s: data integration, data exchange data on the web, managing huge data volumes new data formats: XML, RDF data streams Pichler 6 March, 2018 Page 3

Database Theory 1. Overview 1.1. Databases and Query Languages Database theory Cut-crossing many areas in Computer Science and Mathematics Complexity efficiency of query evaluation, optimization Logics, Finite model theory expressiveness Logic programming, constraint satisfaction (AI) Datalog Graph theory (hyper)tree-decompositions Automata XML query model, data stream processing Benefit from other fields on the one hand, contribute new results on the other hand Pichler 6 March, 2018 Page 4

Database Theory 1. Overview 1.1. Databases and Query Languages Relational data model A database is a collection of relations (or tables) Each database has a schema, i.e., the vocabulary (or signature) Each relation (table) has a name and a schema; the schema of each relation r is defined by a list of attributes (columns), denoted schema(r) Each attribute A has a domain (or universe), denoted dom(a) We define dom(r) = dom(a) A schema(r) Each relation contains a set of tuples (or rows) Formally, a tuple in r is a mapping t : schema(r) dom(r) such that t(a) dom(a) for all A schema(r) Pichler 6 March, 2018 Page 5

Database Theory 1. Overview 1.1. Databases and Query Languages Example Schema Author (AID integer, name string, age integer) Paper (PID string, title string, year integer) Write (AID integer, PID string) Instance { 142, Knuth, 73, 123, Ullman, 67,...} { 181140pods, Querycontainment, 1998,...} { 123, 181140pods, 142, 193214algo,...} Pichler 6 March, 2018 Page 6

Database Theory 1. Overview 1.1. Databases and Query Languages Relational query languages Query languages are formal languages with syntax and semantics: Syntax: algebraic or logical formalism or specific query language (like SQL). Uses the vocabulary of the DB schema Semantics: M[Q] a mapping that transforms a database (instance) D into a database (instance) D = M[Q](D), i.e. the database M[Q](D) is the answer of Q over the DB D. Usually, M[Q] produces a single table, i.e., M[Q]: D dom(d) k in general: k 0. We say Q is a k-ary query. Boolean queries: k = 0, i.e.: possible values of M[Q](D) are {} (= false) or { } (= true). Expressive power of a query language: which mappings M[Q] can be defined by queries Q of a given query language? Pichler 6 March, 2018 Page 7

Database Theory 1. Overview 1.2. Query Languages: Relational Algebra Relational Algebra (RA) σ Selection π Projection Cross product Join ρ Rename Difference Union Intersection Primitive operations, all others can be obtained from these. For precise definition of RA see any DB textbook or Wikipedia. Pichler 6 March, 2018 Page 8

Database Theory 1. Overview 1.2. Query Languages: Relational Algebra Example Recall the schema: Author (AID integer, name string, age integer) Paper (PID string, title string, year integer) Write (AID integer, PID string) Example query: PIDs of the papers NOT written by Knuth π PID (Paper) π PID (Write σ name= Knuth (Author)) Example query: AIDs of authors who wrote exactly one paper S 2 = Write AID=AID PID PID ρ AID AID,PID PID(Write) S = π AID Write π AID S 2 Pichler 6 March, 2018 Page 9

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus Recall First-order Logic (FO) Formulas built using: Quantifiers:,, Boolean connectives:,, Parentheses: (, ) Atoms: R(t 1,..., t n ), t 1 = t 2 Example database (i.e. a first-order structure): Schema: E(FROM string, TO string) Instance: { v, u, u, w, w, v } Example sentences of FO: x ye(x, y) x ye(x, y) x y z(e(z, x) E(x, y)) x y z( (y = z) E(x, y) E(x, z)) Pichler 6 March, 2018 Page 10

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus Relational (Domain) Calculus If ϕ is an FO formula with free variables {x 1,..., x k }, then { x 1,..., x k ϕ} is a k-ary query of the domain calculus. On database A with domain A, it returns the set of all tuples a 1,..., a k (A) k such that the sentence ϕ[a 1,..., a k ] obtained from ϕ by replacing each x i by a i evaluates to true in the structure A. Notational simplifications. We often simply write ϕ rather than { x 1,..., x k ϕ} (i.e., the free variables of a formula are considered as the output). In particular, we usually write ϕ rather than { ϕ} for Boolean queries (k = 0). Pichler 6 March, 2018 Page 11

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus Example Recall the schema: Author (AID integer, name string, age integer) Paper (PID string, title string, year integer) Write (AID integer, PID string) Example query: PIDs of the papers NOT written by Knuth {PID T Y (Paper(PID, T, Y ) ( A AID(Write(AID, PID) Author(AID, Knuth, A))))} Example query: AIDs of authors who wrote exactly one paper {AID PID(Write(AID, PID) PID2(Write(AID, PID2) PID PID2))} Pichler 6 March, 2018 Page 12

Database Theory 1. Overview 1.4. Query Languages: SQL SQL (Structured Query Language) A standardized language: most database management systems (DBMSs) implement SQL SQL is not only a query language: supports constructs to manage the database (create/delete tables/rows) Query constructs of SQL (SELECT/FROM/WHERE/JOIN) are based on relational algebra Example query: AIDs of the co-authors of Knuth SELECT W1.AID FROM Write W1, Write W2, Author WHERE W1.PID=W2.PID AND W1.AID <> W2.AID AND W2.AID=Author.AID AND Author.Name= Knuth Pichler 6 March, 2018 Page 13

Database Theory 1. Overview 1.4. Query Languages: SQL Relational Algebra vs. Relational Calculus vs. SQL Theorem (following Codd 1972) Relational algebra, relational calculus, and SQL queries essentially have equal expressive power. Restrictions apply: no group by and aggregation in SQL queries, safety requirements for relational calculus. Languages with this expressive power are relational complete. all 3 languages have their advantages: 1 use the flexible syntax of relational calculus to specify the query 2 use the simplicity of relational algebra for query simplification/optimization 3 use SQL to implement the query over a DB More expressive query languages: many interesting queries cannot be formulated in these languages Example: no recursive queries (SQL now offers a recursive construct) Pichler 6 March, 2018 Page 14

Database Theory 1. Overview 1.5. Query Languages: other Languages Towards other query languages Paul Erdös (1913-1996), one of the most prolific writers of mathematical papers, wrote around 1500 mathematical articles in his lifetime, mostly co-authored. He had 509 direct collaborators Pichler 6 March, 2018 Page 15

Database Theory 1. Overview 1.5. Query Languages: other Languages Erdös number The Erdös number, is a way of describing the collaborative distance, in regard to mathematical papers, between an author and Erdös. An author s Erdös number is defined inductively as follows: Paul Erdös has an Erdös number of zero. The Erdös number of author M is one plus the minimum among the Erdös numbers of all the authors with whom M co-authored a mathematical paper. Rothschild B.L. co-authored a paper with Erdös Rothschild B.L. s Erdös number is 1. Kolaitis P.G. co-authored a paper with Rothschild B.L. Kolaitis P.G. s Erdös number is 2. Pichler R. co-authored a paper with Kolaitis P.G. Pichler R. s Erdös number is 3. Rowling J.K. s Erdös number is Pichler 6 March, 2018 Page 16

Database Theory 1. Overview 1.5. Query Languages: other Languages Queries about the Erdös number Recall the schema: Author (AID integer, name string, age integer) Paper (PID string, title string, year integer) Write (AID integer, PID string) Assume that Erdös s AID is 17 Query AIDs of the authors whose Erdös number 1 P 1 = π PID (σ AID=17 Write) A 1 = π AID (P 1 Write) Query AIDs of the authors whose Erdös number 2 P 2 = π PID (A 1 Write) A 2 = π AID (P 2 Write) Pichler 6 March, 2018 Page 17

Database Theory 1. Overview 1.5. Query Languages: other Languages Queries about the Erdös number (continued) What about Q1= AIDs of the authors whose Erdös number <? What about Q2= AIDs of the authors whose Erdös number =? Can we express Q1 and Q2 in relational calculus (or equivalently in RA)? We cannot! Formal methods to prove this negative result will be presented in the course. Are there query languages that allow us to express Q1 and Q2? Yes, we can do this in DATALOG (the topic of the next lecture). Pichler 6 March, 2018 Page 18

Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages Some fundamental aspects of query languages Questions dealt with in this lecture Expressive power of a query language Comparison of query languages Complexity of query evaluation Undecidability of important properties of queries (e.g., redundancy, safety) Important special cases (conjunctive queries) Inexpressibility results Pichler 6 March, 2018 Page 19

Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages Learning objectives Short recapitulation of the notion of a relational database, the notion of a query language and its semantics, relational algebra, first-order logic, relationcal calculus, SQL. Some fundamental aspects of query languages Pichler 6 March, 2018 Page 20