Informatics 1: Data & Analysis

Similar documents
Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Chapter 6 Formal Relational Query Languages

Informatics 1: Data & Analysis

Relational Model and Relational Algebra

2.2.2.Relational Database concept

Part I: Structured Data

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra

Relational Model, Relational Algebra, and SQL

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

Part I: Structured Data

Informationslogistik Unit 4: The Relational Algebra

Relational Algebra 1

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Introduction to Data Management. Lecture #11 (Relational Algebra)

Informatics 1: Data & Analysis

Relational Algebra. Procedural language Six basic operators

CS 377 Database Systems

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CMPT 354: Database System I. Lecture 5. Relational Algebra

Chapter 3. The Relational Model. Database Systems p. 61/569

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

Relational Query Languages: Relational Algebra. Juliana Freire

Relational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property

Ian Kenny. November 28, 2017

Relational Algebra Part I. CS 377: Database Systems

Relational Algebra. Relational Query Languages

Mahathma Gandhi University

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Databases - Relational Algebra. (GF Royle, N Spadaccini ) Databases - Relational Algebra 1 / 24

This lecture. Projection. Relational Algebra. Suppose we have a relation

Relational Database: The Relational Data Model; Operations on Database Relations

CS317 File and Database Systems

CMP-3440 Database Systems

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Relational Model 2: Relational Algebra

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Basant Group of Institution

Relational Algebra 1. Week 4

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

Informatics 1: Data & Analysis

Relational Databases

CS2300: File Structures and Introduction to Database Systems

Relational terminology. Databases - Sets & Relations. Sets. Membership

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Chapter 3: The Relational Database Model

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Relational Databases. Relational Databases. Extended Functional view of Information Manager. e.g. Schema & Example instance of student Relation

LECTURE 8: SETS. Software Engineering Mike Wooldridge

EECS 647: Introduction to Database Systems

Relational Algebra. Mr. Prasad Sawant. MACS College. Mr.Prasad Sawant MACS College Pune

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Review for Exam 1 CS474 (Norton)

T-SQL Training: T-SQL for SQL Server for Developers

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Chapter 2: Intro to Relational Model

CS317 File and Database Systems

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

Chapter 2: Intro to Relational Model

Chapter 3. Set Theory. 3.1 What is a Set?

CSC Discrete Math I, Spring Sets

Faloutsos - Pavlo CMU SCS /615

Overview. Carnegie Mellon Univ. School of Computer Science /615 - DB Applications. Concepts - reminder. History

Relational Algebra and SQL

Basic operators: selection, projection, cross product, union, difference,

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

THE RELATIONAL DATABASE MODEL

Chapter 6: Formal Relational Query Languages

Lecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi

CSCC43H: Introduction to Databases. Lecture 3

Database System Concepts

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Querying Data with Transact SQL

CMPT 354: Database System I. Lecture 3. SQL Basics

Databases Relational algebra Lectures for mathematics students

Midterm Review. Winter Lecture 13

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Tutorial 2: Relational Modelling

CompSci 516: Database Systems

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Algebra for sets Introduction to relational algebra for bags

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

RELATIONAL DATA MODEL: Relational Algebra

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

Transcription:

Informatics 1: Data & Analysis Lecture 5: Relational Algebra Ian Stark School of Informatics The University of Edinburgh Tuesday 31 January 2017 Semester 2 Week 3 https://blog.inf.ed.ac.uk/da17

Tutorial Exercises! Tutorial 1, Week 3 Entity-Relationship Modelling Tutorial 2, Week 4 Relational Modelling Tutorial 3, Week 5 Relational Algebra and Tuple-Relational Calculus Almeida Mattias Hillston Ho Horvath Mehra Cremarenco Dumitru El-Naggar Ogunbiyi O Shaughnessy Sinclair Ian Stark Stark Inf1-DA / Lecture 5 Rupprecheter 2017-01-31

Support (1/2)! If you have questions about something in the lectures, difficulties with tutorial exercises, or want to find out more on the material, ask someone. Other students: in your tutorial group, in the lab, elsewhere. Your course tutor: in person at your tutorials, or by email. Online: Piazza discussion; Facebook group; or by email to the inf1-da-students@inf.ed.ac.uk list. InfBASE: Drop-in tutor helpdesk, open twice a week, see the online Informatics Year 1 Handbook for timetable. The lecturer, Ian Stark: in person after lectures, drop-in office hour IF 5.04 every Wednesday 1030 1130, or by email.

Support (2/2)! For technical support when machines aren t working or you have problems with software on DICE, fill out the computing support form. http://computing.help.inf.ed.ac.uk For administrative support in anything related to teaching, contact the Informatics Teaching Organisation (ITO) by filling out their online contact form, or go to the ITO office in Forrest Hill. http://www.inf.ed.ac.uk/teaching/contact If you are having difficulties affecting all of your courses, or issues arising outside the University, then contact your personal tutor.

Lecture Plan for Weeks 1 4 Data Representation This first course section starts by presenting two common data representation models. The entity-relationship (ER) model The relational model Note slightly different naming: -relationship vs. relational Data Manipulation This is followed by some methods for manipulating data in the relational model and using it to extract information. Relational algebra The tuple-relational calculus The query language SQL

Remember Relations as Tables? Relational databases take as fundamental the idea of a relation, comprising a schema and an instance. Fields (a.k.a. attributes, columns) Schema Tuples { (a.k.a. records, rows) s0456782 John 18 john@inf Absolutely everything in a relational database is built from relations and operations upon them. Every relational database is a linked collection of several tables like this: often much wider, and sometimes very, very much longer.

Homework 1. Read This Inside Google Spanner, the Largest Single Database on Earth Wired 2012-11-26 http://www.wired.com/2012/11/google-spanner-time/all/... a database designed to seamlessly operate across hundreds of data centers and millions of machines and trillions of rows of information. 2. Do This Work through Example 7 and Example 8 from Tutorial 1: do each question yourself and write out your answer; then look at the suggested solution.

Languages for Working with Relations Once we have a quantity of structured data in the linked tables of a relational model we may want to rearrange it, build new data structures, and extract information through the use of queries. To understand how this is done, we ll look at three interlinked languages: Relational Algebra High-level mathematical operations for combining and processing relational tables. Tuple-Relational Calculus SQL A declarative mathematical notation for expressing queries over structured data. The standard programming language for writing queries on relational databases.

Relational Algebra Relational algebra is a mathematical language for describing certain operations on the schemas and tables of a relational model. Each of these operations takes one or more tables, and returns another. Basic operations: Derived operations: selection σ, projection π, renaming ρ union, difference, cross-product intersection and different kinds of join Ted Codd gave a completeness proof showing that these operations were enough to express very general kinds of query: so, with an efficient implementation of these operations, you can answer all those queries. Conversely, Codd s result also shows that to implement any expressive query language requires finding ways to carry out all of these operations.

Selection and Projection s0456782 John 18 john@inf Student σ age>18 (Student) name age John 18 Mary 18 Helen 20 Peter 22 π (Student) name, age name age Helen 20 Peter 22 Combination Selection picks out the rows of a table satisfying a logical predicate

Selection and Projection s0456782 John 18 john@inf Student σ age>18 (Student) name age John 18 Mary 18 Helen 20 Peter 22 π (Student) name, age name age Helen 20 Peter 22 Combination Projection picks out the columns of a table by their field name.

Selection and Projection s0456782 John 18 john@inf Student σ age>18 (Student) name age John 18 Mary 18 Helen 20 Peter 22 π (Student) name, age name age Helen 20 Peter 22 Combination Combining selection and projection picks out a rectangular subtable. π name,age (σ age>18 (Student)) = σ age>18 (π name,age (Student))

Definitions Selection Relation σ P (R) is the table of rows in R which satisfy predicate P. Thus σ P (R) has the same schema as R, but possibly lower cardinality. Predicates like P, Q,... are made up of Assertions about field values: (age > 18), (degree = "CS"),... Logical combinations of these: (P Q), (P Q Q ),... Projection Relation π a1,...,a n (R) is the table of all tuples of the fields a 1,..., a n taken from the rows of R. Thus π a1,...,a n (R) usually has a lower-arity schema than R, and may also have lower cardinality.

Symbol Soup Logical Operators Truth TRUE T tt 1 Falsity FALSE F ff 0 Conjunction AND P Q & &&. Disjunction OR P Q + Implication IMPLIES P Q Equivalence IFF P Q Negation NOT P!

Reusing Symbols + It s no accident that symbols such as or from arithmetic or set theory get reused in other settings, like logic. They often obey very similar rules, and this can also be an indicator of deeper connections. (P Q) R = (P R) (Q R) Logical propositions P, Q and R (x + y) z = (x z) + (y z) Numbers x, y and z (A B) C = (A C) (B C) Sets A, B and C

Brackets + ( ) Parentheses Round brackets [ ] Brackets Square brackets { } Braces Curly brackets Chevrons Angle brackets

Colour Coding of Slides! Regular Substantive Slide Support (2/2)! Brackets + Selection Relation σp(r) is the table of rows in R which satisfy predicate P. For technical support when machines aren t working or you have problems with software on DICE, fill out the computing support form. ( ) Parentheses Round brackets Thus σp(r) has the same schema as R, but possibly lower cardinality. Predicates like P, Q,... are made up of Assertions about field values: (age > 18), (degree = "CS"),... Logical combinations of these: (P Q), (P Q Q ),... Projection http://computing.help.inf.ed.ac.uk For administrative support in anything related to teaching, contact the Informatics Teaching Organisation (ITO) by filling out their online contact form, or go to the ITO office in Forrest Hill. [ ] { } Brackets Braces Square brackets Curly brackets Relation πa1,...,an (R) is the table of all tuples of the fields a1,..., an taken from the rows of R. Thus πa1,...,an (R) usually has a lower-arity schema than R, and may also have lower cardinality. http://www.inf.ed.ac.uk/teaching/contact If you are having difficulties affecting all of your courses, or issues arising outside the University, then contact your personal tutor. Chevrons Angle brackets Ian Stark Inf1-DA / January 30, 2017 Ian Stark Inf1-DA / January 30, 2017 Ian Stark Inf1-DA / January 30, 2017

Renaming Student new table name ρ (mn sid, email address) Student renaming list S sid name age address Renaming changes the names of some or all fields in a table, giving a schema of the same arity and type. This can be used to avoid naming conflicts when combining tables.

Union s0456782 John 18 john@inf S 1 s0489967 Basil 19 basil@inf s9989232 Ophelia 24 oph@bio s0289125 Michael 21 mike@geo s0456782 John 18 john@inf s0489967 Basil 19 basil@inf s9989232 Ophelia 24 oph@bio s0289125 Michael 21 mike@geo S 1 S 2 S 2 Union combines the rows of two tables that have the same schema.

Difference s0456782 John 18 john@inf S 1 s0489967 Basil 19 basil@inf s9989232 Ophelia 24 oph@bio s0289125 Michael 21 mike@geo s0456782 John 18 john@inf S 1 S 2 S 2 Difference takes all the rows of one table which do not appear in another.

Intersection s0456782 John 18 john@inf S 1 s0489967 Basil 19 basil@inf s9989232 Ophelia 24 oph@bio s0289125 Michael 21 mike@geo S 1 S 2 S 2 Intersection takes all the rows of one table which do appear in another. S 1 S 2 = S 1 (S 1 S 2 )

Definitions Union Relation R 1 R 2 contains every tuple that appears in either R 1 or R 2. Difference Relation R 1 R 2 contains every tuple that appears R 1 but not in R 2. Intersection Relation R 1 R 2 contains every tuple that appears in R 1 and also in R 2. In all of these cases the schemas of R 1 and R 2 must be compatible all the same fields with all the same types. Intersection can be defined in terms of difference, but not the other way around. (Try it and see)

Cross Product s0456782 John 18 john@inf S code name year inf1 Informatics 1 1 math1 Mathematics 1 1 R s0456782 John 18 john@inf s0456782 John 18 john@inf S R code name inf1 Informatics 1 math1 Mathematics 1 inf1 Informatics 1 math1 Mathematics 1 inf1 Informatics 1 math1 Mathematics 1 inf1 Informatics 1 math1 Mathematics 1 year 1 1 1 1 1 1 1 1 Cross product combines every row of one table with every row of another.

Definition Cross product For any relations R and S, the cross product R S, also known as the Cartesian product, is a relation defined as follows. Schema Rows All the fields and types from R, plus all fields and types from S. If necessary the renaming operation ρ can ensure none of these clash. For every row (u 1,..., u n ) of R and every row (v 1,..., v m ) of S the product R S contains row (u 1,..., u n, v 1,..., v m ). The arity of R S is the sum of the arities of R and S. The cardinality of R S is the product of the cardinalities of R and S.

Relational Join The most commonly used relational operation is the join R P S which combines cross-product with selection. Rows in Join For every row (u 1,..., u n ) of R and every row (v 1,..., v m ) of S the join relation R P S contains row (u 1,..., u n, v 1,..., v m ) if and only if that tuple of values satisfies predicate P. Here R and S are any two relations, with P any predicate defined on the fields of R and S together R P S = σ P (R S)

Example of Join s0456782 John 18 john@inf Student mn s0412375 s0378435 code inf1 80 math1 70 Takes mark s0456782 John 18 john@inf s0456782 John 18 john@inf mn s0412375 s0378435 s0412375 s0378435 s0412375 s0378435 s0412375 s0378435 σ Student.mn = Takes.mn (Student Takes) code mark inf1 80 math1 70 inf1 80 math1 70 inf1 80 math1 70 inf1 80 math1 70

Example of Join s0456782 John 18 john@inf Student mn s0412375 s0378435 code inf1 80 math1 70 Takes mark s0456782 John 18 john@inf s0456782 John 18 john@inf mn s0412375 s0378435 s0412375 s0378435 s0412375 s0378435 s0412375 s0378435 Student Student.mn = Takes.mn Takes code mark inf1 80 math1 70 inf1 80 math1 70 inf1 80 math1 70 inf1 80 math1 70

Refined Joins In general, a join R P S can use an arbitrary predicate P. However, some kinds of predicate are particularly common, and often followed by projection to eliminate duplicate or redundant columns. Equijoin An equijoin starts with a join where the predicate states that particular fields from each relation must be equal. That is, P has the form (a 1 = b 1 ) (a k = b k ) for some fields a 1,... a k of R and b 1,..., b k of S. For example, the relation (Student Student.mn=Takes.mn Takes) above is an equijoin between these tables on the two mn fields. It s common to then project onto only certain columns to remove the fields that are now duplicated.

Refined Joins Natural Join The natural join R S of relations R and S is the equijoin requiring equalities between any fields in the two relations that share the same name, followed by a projection to remove duplicate columns. For example, the natural join of the Student and Takes relations: Student Takes = πmn,name,age, email,code,mark (σ Student.mn=Takes.mn(Student Takes)) This records every student in combination with every course they take. This example is typical: a natural join between two tables where one has a foreign key constraint referring to the other.

Specialist Joins + The SQL standard defines no less than five different types of join. Inner Join is the basic join R P S described earlier. Left Outer Join is the basic join, plus rows for every tuple in the left-hand table R that matches nothing in the right-hand table S. Missing fields are filled with NULL. Right Outer Join is the basic join plus rows for every tuple in the right-hand table S that matches nothing in the left-hand table R. Missing fields are filled with NULL. Full Outer Join has every row from all three previous joins. Cross Join is the cross-product R S, with every tuple from R paired with every tuple from S, and no matching done at all.