Databases 2011 Lectures 01 03

Similar documents
Databases Lectures 1 and 2

Databases. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2010

Databases. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2010

Databases tgg22. Computer Laboratory University of Cambridge, UK. Databases, Lent 2013

Databases. Ken Moody. Computer Laboratory University of Cambridge, UK. Lecture notes by Timothy G. Griffin. Lent 2012

Databases Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2015

Unit 2. Unit 3. Unit 4

Databases 2011 Lectures 08 12

DATABASE MANAGEMENT SYSTEMS

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

Database Management Systems Paper Solution

Database Management Systems

Database Technology Introduction. Heiko Paulheim

CS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I

; Spring 2008 Prof. Sang-goo Lee (14:30pm: Mon & Wed: Room ) ADVANCED DATABASES

E-R diagrams and database schemas. Functional dependencies. Definition (tuple, attribute, value). A tuple has the form

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

M S Ramaiah Institute of Technology Department of Computer Science And Engineering

Relational Databases

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Course: Database Management Systems. Lê Thị Bảo Thu

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Relational Databases Fall 2017 Lecture 1

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

CS2255 DATABASE MANAGEMENT SYSTEMS QUESTION BANK UNIT I

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

B.C.A DATA BASE MANAGEMENT SYSTEM MODULE SPECIFICATION SHEET. Course Outline

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Management

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Relational Model: History

CS634 Architecture of Database Systems Spring Elizabeth (Betty) O Neil University of Massachusetts at Boston

MIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: Time: 60 min Marks: 38

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

15CS53: DATABASE MANAGEMENT SYSTEM

CSE 544 Principles of Database Management Systems

Running Example Tables name location

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Data about data is database Select correct option: True False Partially True None of the Above

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

Relational Database: The Relational Data Model; Operations on Database Relations

CS/B.Tech/CSE/New/SEM-6/CS-601/2013 DATABASE MANAGEMENENT SYSTEM. Time Allotted : 3 Hours Full Marks : 70

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

Chapter 1: Introduction

Midterm Review. Winter Lecture 13

(All chapters begin with an Introduction end with a Summary, Exercises, and Reference and Bibliography) Preliminaries An Overview of Database

Relational Model and Relational Algebra

DOWNLOAD PDF INSIDE RELATIONAL DATABASES

The data structures of the relational model Attributes and domains Relation schemas and database schemas

Chapter 6 The Relational Algebra and Calculus

Sankalchand Patel College of Engineering, Visnagar B.E. Semester III (CE/IT) Database Management System Question Bank / Assignment

Database Design Theory and Normalization. CS 377: Database Systems

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTONOMY. FIRST YEAR B.Sc. COMPUTER SCIENCE SEMESTER I

Database Applications (15-415)

Relational Model, Relational Algebra, and SQL

Data Analysis. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 12/13/12

Lecture 16. The Relational Model

2.2.2.Relational Database concept

CS317 File and Database Systems

CS403- Database Management Systems Solved MCQS From Midterm Papers. CS403- Database Management Systems MIDTERM EXAMINATION - Spring 2010

Relational Database Systems Part 01. Karine Reis Ferreira

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

Informatics 1: Data & Analysis

Evolution of Database Systems

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Northern India Engineering College, New Delhi Question Bank Database Management System. B. Tech. Mechanical & Automation Engineering V Semester

QUIZ 1 REVIEW SESSION DATABASE MANAGEMENT SYSTEMS

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

Informationslogistik Unit 4: The Relational Algebra

Databases 1. Daniel POP

2011 DATABASE MANAGEMENT SYSTEM

Can you name one application that does not need any data? Can you name one application that does not need organized data?

Chapter 1: Introduction

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017

Fundamentals of Database Systems

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems

CSE 344 Final Review. August 16 th

CT13 DATABASE MANAGEMENT SYSTEMS DEC 2015

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

MaanavaN.Com DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

Review for Exam 1 CS474 (Norton)

CS143: Relational Model

Course Outline Faculty of Computing and Information Technology

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

CS 377 Database Systems

Informatics 1: Data & Analysis

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

CS403- Database Management Systems Solved Objective Midterm Papers For Preparation of Midterm Exam

Novel Materialized View Selection in a Multidimensional Database

UNIT 3 DATABASE DESIGN

Final Review. Zaki Malik November 20, 2008

Relational Algebraic Graph Algorithms

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

ITCS 3160 DATA BASE DESIGN AND IMPLEMENTATION

Transcription:

Databases 2011 Lectures 01 03 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2011 T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 1 / 70

Lecture 01 : What is a DBMS? DB vs. IR Relational Databases ACID properties Two fundamental trade-offs OLTP vs OLAP Course outline T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 2 / 70

Example Database Management Systems (DBMSs) A few database examples Banking : supporting customer accounts, deposits and withdrawals University : students, past and present, marks, academic status Business : products, sales, suppliers Real Estate : properties, leases, owners, renters Aviation : flights, seat reservations, passenger info, prices, payments Aviation : Aircraft, maintenance history, parts suppliers, parts orders T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 3 / 70

Some observations about these DBMSs... They contains highly structured data that has been engineered to model some restricted aspect of the real world They support the activity of an organization in an essential way They support concurrent access, both read and write They often outlive their designers Users need to know very little about the DBMS technology used Well designed database systems are nearly transparent, just part of our infrastructure T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 4 / 70

Databases vs Information Retrieval Always ask What problem am I solving? DBMS exact query results optimized for concurrent updates data models a narrow domain generates documents (reports) increase control over information IR system fuzzy query results optimized for concurrent reads domain often open-ended search existing documents reduce information overload And of course there are many systems that combine elements of DB and IR. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 5 / 70

Still the dominant approach : Relational DBMSs The problem : in 1970 you could not write a database application without knowing a great deal about the the low-level physical implementation of the data. Codd s radical idea [C1970]: give users a model of data and a language for manipulating that data which is completely independent of the details of its physical representation/implementation. This decouples development of Database Management Systems (DBMSs) from the development of database applications (at least in a idealized world). T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 6 / 70

What services do applications expect from a DBMS? Transactions ACID properties Atomicity Either all actions are carried out, or none are logs needed to undo operations, if needed Consistency If each transaction is consistent, and the database is initially consistent, then it is left consistent Applications designers must exploit the DBMS s capabilities. Isolation Transactions are isolated, or protected, from the effects of other scheduled transactions Serializability, 2-phase commit protocol Durability If a transactions completes successfully, then its effects persist Logging and crash recovery These concepts should be familiar from Concurrent Systems and Applications. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 7 / 70

What constitutes a good DBMS application design? Domain of Interest real-world change Domain of Interest represent represent Database database update(s) Database At the very least, this diagram should commute! Does your database design support all required changes? Can an update corrupt the database? T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 8 / 70

Relational Database Design Our tools Entity-Relationship (ER) modeling Relational modeling SQL implementation high-level, diagram-based design formal model normal forms based on Functional Dependencies (FDs) Where the rubber meets the road The ER and FD approaches are complementary ER facilitates design by allowing communication with domain experts who may know little about database technology. FD allows us formally explore general design trade-offs. Such as A Fundamental Trade-off of Database Design: the more we reduce data redundancy, the harder it is to enforce some types of data integrity. (An example of this is made precise when we look at 3NF vs. BCNF.) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 9 / 70

ER Demo Diagram (Notation follows SKS book) 1 Name Number Employee ISA Does Mechanic Salesman Description Price Date Number RepairJob Date Buys Sells Value Parts Cost License Value Comission Work Repairs Car seller buyer Manufacturer Model Year Client ID Name Address Phone 1 By Pável Calado, http://www.texample.net/tikz/examples/entity-relationship-diagram T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 10 / 70

A Fundamental Trade-off of Database Implementation Query response vs. update throughput Redundancy is a Bad Thing. One of the main goals of ER and FD modeling is to reduce data redundancy. The seek normalized designs. A normalized database can support high update throughput and greatly facilitates the task of ensuring semantic consistency and data integrity. Update throughput is increased because in a normalized database a typical transaction need only lock a few data items perhaps just one field of one row in a very large table. Redundancy is a Good Thing. A de-normalized database most can greatly improve the response time of read-only queries. T. Selective Griffin (cl.cam.ac.uk) and controlled Databases de-normalization 2011 Lectures 01 03 is often required DB 2011 in 11 / 70

OLAP vs. OLTP OLTP Online Transaction Processing OLAP Online Analytical Processing Commonly associated with terms like Decision Support, Data Warehousing, etc. OLAP OLTP Supports analysis day-to-day operations Data is historical current Transactions mostly reads updates optimized for query processing updates Normal Forms not important important T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 12 / 70

Example : Data Warehouse (Decision support) fast updates business analysis queries Extract Operational Database Data Warehouse T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 13 / 70

Example : Embedded databases FIDO = Fetch Intensive Data Organization T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 14 / 70

Example : Hinxton Bio-informatics T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 15 / 70

NoSQL Movement Technologies Key-value store Directed Graph Databases Main memory stores Distributed hash tables Applications Facebook Google imdb... Always remember to ask : What problem am I solving? T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 16 / 70

Term Outline Lecture 01 What is a DBMS? Course overview. DB vs IR. ACID properties of DBMSs. Schema design. Fundamental trade-offs. Lecture 02 Mathematical relations and SQL tables. Relations, attributes, tuples, and relational schema. Implementing these in SQL. Lecture 03 Relational Query Languages. Relational algebra, relational calculi (tuple and domain). Examples of SQL constructs that mix and match these models. Lecture 04 Entity-Relationship (ER) Modeling Entities, Attributes, and Relationships. Their implementation using mathematical relations and integrity constrains. Their implementation using SQL, Foreign Keys, Referential Integrity. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 17 / 70

Term Outline Lecture 05 More on ER Modeling N-ary relations. Lecture 06 Making the diagram commute. Update anomalies. Evils of data redundancy. More on integrity constraints. Lecture 07 Functional Dependencies (FDs). Implied functional dependencies, logical closure. Reasoning about functional dependencies. Lecture 08 Normal Forms. 3rd normal form. Boyce-Codd normal form. Decomposition examples. Multi-valued dependencies and Fourth normal form. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 18 / 70

Term Outline Lecture 09 Schema Decomposition. Schema decomposition. Lossless join decomposition. Dependency preservation. Lecture 10 Schema Evolution. Scope and goals of database applications change over time. Integration of distinct databases. XML as a data exchange language. Schema integration. Lecture 11 Missing data and derived data in SQL Null values (and three-valued logic). Inner and Outer Joins. Locking vs. update throughput. Indices are derived data! Aggregaton queries. Multi-set (bag) semantics. Database Views. Materialized views. Using views to implement complex integrity constraints. Selective de-normalization. Lecture 12 OLAP The extreme case: read only databases, data warehousing, data-cubes, and OLAP vs OLTP. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 19 / 70

Recommended Reading Textbooks SKS Silberschatz, A., Korth, H.F. and Sudarshan, S. (2002). Database system concepts. McGraw-Hill (4th edition). (Adjust accordingly for other editions) Chapters 1 (DBMSs) 2 (Entity-Relationship Model) 3 (Relational Model) 4.1 4.7 (basic SQL) 6.1 6.4 (integrity constraints) 7 (functional dependencies and normal forms) 22 (OLAP) UW Ullman, J. and Widom, J. (1997). A first course in database systems. Prentice Hall. CJD Date, C.J. (2004). An introduction to database systems. Addison-Wesley (8th ed.). T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 20 / 70

Reading for the fun of it... Research Papers (Google for them) C1970 E.F. Codd, (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. F1977 Ronald Fagin (1977) Multivalued dependencies and a new normal form for relational databases. TODS 2 (3). L2003 L. Libkin. Expressive power of SQL. TCS, 296 (2003). C+1996 L. Colby et al. Algorithms for deferred view maintenance. SIGMOD 199. G+1997 J. Gray et al. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals (1997) Data Mining and Knowledge Discovery. H2001 A. Halevy. Answering queries using views: A survey. VLDB Journal. December 2001. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 21 / 70

Lecture 02 : Relations, SQL Tables, Simple Queries Mathematical relations and relational schema Using SQL to implement a relational schema Keys Database query languages The Relational Algebra The Relational Calculi (tuple and domain) a bit of SQL T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 22 / 70

Let s start with mathematical relations Suppose that S 1 and S 2 are sets. The Cartesian product, S 1 S 2, is the set S 1 S 2 = {(s 1, s 2 ) s 1 S 1, s 2 S 2 } A (binary) relation over S 1 S 2 is any set r with In a similar way, if we have n sets, then an n-ary relation r is a set r S 1 S 2. S 1, S 2,..., S n, r S 1 S 2 S n = {(s 1, s 2,..., s n ) s i S i } T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 23 / 70

Relational Schema Let X be a set of k attribute names. We will often ignore domains (types) and say that R(X) denotes a relational schema. When we write R(Z, Y) we mean R(Z Y) and Z Y = φ. u.[x] = v.[x] abbreviates u.a 1 = v.a 1 u.a k = v.a k. X represents some (unspecified) ordering of the attribute names, A 1, A 2,..., A k T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 24 / 70

Mathematical vs. database relations Suppose we have an n-tuple t S 1 S 2 S n. Extracting the i-th component of t, say as π i (t), feels a bit low-level. Solution: (1) Associate a name, A i (called an attribute name) with each domain S i. (2) Instead of tuples, use records sets of pairs each associating an attribute name A i with a value in domain S i. A database relation R over the schema A 1 : S 1 A 2 : S 2 A n : S n is a finite set R {{(A 1, s 1 ), (A 2, s 2 ),..., (A n, s n )} s i S i } T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 25 / 70

Example A relational schema Students(name: string, sid: string, age : integer) A relational instance of this schema Students = { {(name, Fatima), (sid, fm21), (age, 20)}, {(name, Eva), (sid, ev77), (age, 18)}, {(name, James), (sid, jj25), (age, 19)} } A tabular presentation name sid age Fatima fm21 20 Eva ev77 18 James jj25 19 T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 26 / 70

Key Concepts Relational Key Suppose R(X) is a relational schema with Z X. If for any records u and v in any instance of R we have u.[z] = v.[z] = u.[x] = v.[x], then Z is a superkey for R. If no proper subset of Z is a superkey, then Z is a key for R. We write R(Z, Y) to indicate that Z is a key for R(Z Y). Note that this is a semantic assertion, and that a relation can have multiple keys. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 27 / 70

Creating Tables in SQL create table Students (sid varchar(10), name varchar(50), age int); -- insert record with attribute names insert into Students set name = Fatima, age = 20, sid = fm21 ; -- or insert records with values in same order -- as in create table insert into Students values ( jj25, James, 19), ( ev77, Eva, 18); T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 29 / 70

Listing a Table in SQL -- list by attribute order of create table mysql> select * from Students; +------+--------+------+ sid name age +------+--------+------+ ev77 Eva 18 fm21 Fatima 20 jj25 James 19 +------+--------+------+ 3 rows in set (0.00 sec) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 31 / 70

Listing a Table in SQL -- list by specified attribute order mysql> select name, age, sid from Students; +--------+------+------+ name age sid +--------+------+------+ Eva 18 ev77 Fatima 20 fm21 James 19 jj25 +--------+------+------+ 3 rows in set (0.00 sec) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 33 / 70

Keys in SQL A key is a set of attributes that will uniquely identify any record (row) in a table. -- with this create table create table Students (sid varchar(10), name varchar(50), age int, primary key (sid)); -- if we try to insert this (fourth) student... mysql> insert into Students set name = Flavia, age = 23, sid = fm21 ; ERROR 1062 (23000): Duplicate entry fm21 for key PRIMARY T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 35 / 70

What is a (relational) database query language? Input : a collection of relation instances Output : a single relation instance R 1, R 2,, R k = Q(R 1, R 2,, R k ) How can we express Q? In order to meet Codd s goals we want a query language that is high-level and independent of physical data representation. There are many possibilities... T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 36 / 70

The Relational Algebra (RA) Q ::= R base relation σ p (Q) selection π X (Q) projection Q Q product Q Q difference Q Q union Q Q intersection ρ M (Q) renaming p is a simple boolean predicate over attributes values. X = {A 1, A 2,..., A k } is a set of attributes. M = {A 1 B 1, A 2 B 2,..., A k B k } is a renaming map. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 37 / 70

Relational Calculi The Tuple Relational Calculus (TRC) Q = {t P(t)} The Domain Relational Calculus (DRC) Q = {(A 1 = v 1, A 2 = v 2,..., A k = v k ) P(v 1, v 2,, v k )} T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 38 / 70

The SQL standard Origins at IBM in early 1970 s. SQL has grown and grown through many rounds of standardization : ANSI: SQL-86 ANSI and ISO : SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008 SQL is made up of many sub-languages : Query Language Data Definition Language System Administration Language... T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 39 / 70

Selection R A B C D 20 10 0 55 11 10 0 7 4 99 17 2 77 25 4 0 = Q(R) A B C D 20 10 0 55 77 25 4 0 RA Q = σ A>12 (R) TRC Q = {t t R t.a > 12} DRC Q = {{(A, a), (B, b), (C, c), (D, d)} {(A, a), (B, b), (C, c), (D, d)} R a > 12} SQL select * from R where R.A > 12 T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 41 / 70

Projection R A B C D 20 10 0 55 11 10 0 7 4 99 17 2 77 25 4 0 = Q(R) B C 10 0 99 17 25 4 RA Q = π B,C (R) TRC Q = {t u R t.[b, C] = u.[b, C]} DRC Q = {{(B, b), (C, c)} {(A, a), (B, b), (C, c), (D, d)} R} SQL select distinct B, C from R T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 43 / 70

Why the distinct in the SQL? The SQL query select B, C from R will produce a bag (multiset)! R A B C D 20 10 0 55 11 10 0 7 4 99 17 2 77 25 4 0 = Q(R) B C 10 0 10 0 99 17 25 4 SQL is actually based on multisets, not sets. We will look into this more in Lecture 11. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 45 / 70

Lecture 03 : More on Relational Query languages Outline Constructing new tuples! Joins Limitations of Relational Algebra T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 46 / 70

Renaming R A B C D 20 10 0 55 11 10 0 7 4 99 17 2 77 25 4 0 = Q(R) A E C F 20 10 0 55 11 10 0 7 4 99 17 2 77 25 4 0 RA Q = ρ {B E, D F } (R) TRC Q = {t u R t.a = u.a t.e = u.e t.c = u.c t.f = u.d} DRC Q = {{(A, a), (E, b), (C, c), (F, d)} {(A, a), (B, b), (C, c), (D, d)} R} SQL select A, B as E, C, D as F from R T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 48 / 70

Union R S Q(R, S) A B 20 10 11 10 4 99 A B 20 10 77 1000 = A B 20 10 11 10 4 99 77 1000 RA Q = R S TRC Q = {t t R t S} DRC Q = {{(A, a), (B, b)} {(A, a), (B, b)} R {(A, a), (B, b)} S} SQL (select * from R) union (select * from S) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 50 / 70

Intersection R S Q(R) A B 20 10 11 10 4 99 A B 20 10 77 1000 = A B 20 10 RA Q = R S TRC Q = {t t R t S} DRC Q = {{(A, a), (B, b)} {(A, a), (B, b)} R {(A, a), (B, b)} S} SQL (select * from R) intersect (select * from S) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 52 / 70

Difference R S Q(R) A B 20 10 11 10 4 99 A B 20 10 77 1000 = A B 11 10 4 99 RA Q = R S TRC Q = {t t R t S} DRC Q = {{(A, a), (B, b)} {(A, a), (B, b)} R {(A, a), (B, b)} S} SQL (select * from R) except (select * from S) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 54 / 70

Wait, are we missing something? Suppose we want to add information about college membership to our Student database. We could add an additional attribute for the college. StudentsWithCollege : +--------+------+------+--------+ name age sid college +--------+------+------+--------+ Eva 18 ev77 King s Fatima 20 fm21 Clare James 19 jj25 Clare +--------+------+------+--------+ T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 56 / 70

Put logically independent data in distinct tables? Students : +--------+------+------+-----+ name age sid cid +--------+------+------+-----+ Eva 18 ev77 k Fatima 20 fm21 cl James 19 jj25 cl +--------+------+------+-----+ Colleges : +-----+---------------+ cid college_name +-----+---------------+ k King s cl Clare sid Sidney Sussex q Queens...... But T. how Griffin do (cl.cam.ac.uk) we put them back Databases together 2011 Lectures again? 01 03 DB 2011 58 / 70

Product R A B 20 10 11 10 4 99 S C D 14 99 77 100 = Q(R, S) A B C D 20 10 14 99 20 10 77 100 11 10 14 99 11 10 77 100 4 99 14 99 4 99 77 100 Note the automatic flattening RA Q = R S TRC Q = {t u R, v S, t.[a, B] = u.[a, B] t.[c, D] = v.[c, D]} DRC Q = {{(A, a), (B, b), (C, c), (D, d)} {(A, a), (B, b)} R {(C, c), (D, d)} S} SQL select A, B, C, D from R, S T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 60 / 70

Product is special! R A B 20 10 4 99 = R ρ A C, B D (R) A B C D 20 10 20 10 20 10 4 99 4 99 20 10 4 99 4 99 is the only operation in the Relational Algebra that created new records (ignoring renaming), But usually creates too many records! Joins are the typical way of using products in a constrained manner. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 61 / 70

Natural Join Natural Join Given R(X, Y) and S(Y, Z), we define the natural join, denoted R S, as a relation over attributes X, Y, Z defined as R S {t u R, v S, u.[y] = v.[y] t = u.[x] u.[y] v.[z]} In the Relational Algebra: R S = π X,Y,Z (σ Y=Y (R ρ Y Y (S))) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 62 / 70

Join example Students name sid age cid Fatima fm21 20 cl Eva ev77 18 k James jj25 19 cl Colleges cid cname k King s cl Clare q Queens.. πname,cname(students Colleges) = name Fatima Eva James cname Clare King s Clare T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 64 / 70

The same in SQL select name, cname from Students, Colleges where Students.cid = Colleges.cid +--------+--------+ name cname +--------+--------+ Eva King s Fatima Clare James Clare +--------+--------+ T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 66 / 70

Division in the Relational Algebra? Clearly, R S π X (R). So R S = π X (R) C, where C represents counter examples to the division condition. That is, in the TRC, C = {x s S, x s R}. U = π X (R) S represents all possible x s for x X(R) and s S, so T = U R represents all those x s that are not in R, so C = π X (T ) represents those records x that are counter examples. Division in RA R S π X (R) π X ((π X (R) S) R) T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 67 / 70

Division Given R(X, Y) and S(Y), the division of R by S, denoted R S, is the relation over attributes X defined as (in the TRC) R S {x s S, x s R}. name Fatima Fatima Eva Eva Eva James award writing music music writing dance dance award music writing dance = name Eva T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 68 / 70

Query Safety A query like Q = {t t R t S} raises some interesting questions. Should we allow the following query? Q = {t t S} We want our relations to be finite! Safety A (TRC) query Q = {t P(t)} is safe if it is always finite for any database instance. Problem : query safety is not decidable! Solution : define a restricted syntax that guarantees safety. Safe queries can be represented in the Relational Algebra. T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 69 / 70

Limitations of simple relational query languages The expressive power of RA, TRC, and DRC are essentially the same. None can express the transitive closure of a relation. We could extend RA to a more powerful languages (like Datalog). SQL has been extended with many features beyond the Relational Algebra. stored procedures recursive queries ability to embed SQL in standard procedural languages T. Griffin (cl.cam.ac.uk) Databases 2011 Lectures 01 03 DB 2011 70 / 70