Relational Model. CSC 343 Winter 2018 MICHAEL LIUT

Similar documents
Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Relational data model

The Relational Data Model. Data Model

Database Management Systems. Chapter 3 Part 1

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Relational Model. Chapter 3

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Chapter 2 Introduction to Relational Models

The Relational Model of Data (ii)

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

The Relational Model

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

SQL: Data Definition Language

Database Applications (15-415)

SQL DATA DEFINITION LANGUAGE

SQL DATA DEFINITION LANGUAGE

The Relational Model 2. Week 3

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

The Relational Model. Week 2

EGCI 321: Database Systems. Dr. Tanasanee Phienthrakul

Chapter 2: Intro to Relational Model

The Relational Model

Comp 5311 Database Management Systems. 4b. Structured Query Language 3

Database Management Systems. Chapter 1

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

SQL DATA DEFINITION LANGUAGE

Database Systems ( 資料庫系統 )

Announcements If you are enrolled to the class, but have not received the from Piazza, please send me an . Recap: Lecture 1.

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

The Relational Model. Database Management Systems

DBMS. Relational Model. Module Title?

Chapter 1: Introduction

Keys, SQL, and Views CMPSCI 645

CS 2451 Database Systems: Relational Data Model

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

The Relational Model

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

CS425 Midterm Exam Summer C 2012

CS425 Fall 2016 Boris Glavic Chapter 2: Intro to Relational Model

CSC 742 Database Management Systems

CSCB20 Week 2. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017

CMPT 354: Database System I. Lecture 2. Relational Model

Database Systems. Course Administration

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

CIS 330: Applied Database Systems

The Relational Model. 3/1/2010 Database Management Systems, Roya Choupani 1

Databases. Jörg Endrullis. VU University Amsterdam

CS275 Intro to Databases

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

Lecture 2: Introduction to SQL

Introduction to Database Management Systems

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Database Technology Introduction. Heiko Paulheim

CMPT 354: Database System I. Lecture 3. SQL Basics

Introduction to Data Management. Lecture #1 (The Course Trailer )

Relational model and basic SQL

CSCB20. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017

Database Management Systems MIT Introduction By S. Sabraz Nawaz

The Relational Model

Introduction to Database Systems

Relational Model. Nisa ul Hafidhoh

Introduction to Data Management. Lecture #3 (E-R Design, Cont d.)

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Introduction to Data Management. Lecture #5 (E-R Relational, Cont.)

Database Systems Relational Model. A.R. Hurson 323 CS Building

Modern Database Systems Lecture 1

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

SQL: Data Manipulation Language

COGS 121 HCI Programming Studio. Week 03 - Tech Lecture

Introduction to Data Management. Lecture #17 (SQL, the Never Ending Story )

L3 The Relational Model. Eugene Wu Fall 2018

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

LAB 3 Notes. Codd proposed the relational model in 70 Main advantage of Relational Model : Simple representation (relationstables(row,

Lecture 16. The Relational Model

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Introduction to Data Management. Lecture #1 (Course Trailer )

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

Database Languages. A DBMS provides two types of languages: Language for accessing & manipulating the data. Language for defining a database schema

CISC 3140 (CIS 20.2) Design & Implementation of Software Application II

Introduction to Data Management. Lecture 16 (SQL: There s STILL More...) It s time for another installment of...

ITCS 3160 DATA BASE DESIGN AND IMPLEMENTATION

Lecture 3: SQL Part II

Review: Where have we been?

Transcription:

Relational Model CSC 343 Winter 2018 MICHAEL LIUT (MICHAEL.LIUT@UTORONTO.CA) DEPARTMENT OF MATHEMATICAL AND COMPUTATIONAL SCIENCES UNIVERSITY OF TORONTO MISSISSAUGA

Administration Please email me your pair selection before 9AM on January 22 nd. o There is a piazza page now available for you to try and fine groups. Piazza has been created, anyone who doesn t have access please email me. Textbook readings are posted on the course website. Be sure to download the tutorial files before Tutorial 1! 2

Describing Data: Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using a given data model. The relational model of data is the most widely used today. Main concept: relation, basically a table with rows and columns. Use tables to represent data and relationships. Every relation has a schema, which describes the columns, or attributes. 3

Relational Model Proposed by Edgar. F. Codd in 1970 as a data model which strongly supports data independence. Made available in commercial DBMSs in 1981. It is not easy to implement data independence efficiently and reliably! It is based on (a variant of) the mathematical notation of relation. Relations are represented as tables. 4

Relational Database: Definitions Relational Database: a set of relations. Relation: made of two parts 1. Instance à a table with rows and columns. 2. Schema à specifies name of a relation, and, name and type of each column. # of Rows = cardinality # of Columns = degree/arity To make it easier, think of a relation as a set of rows and tuples i.e. all rows are distinct! rows, not columns!!! e.g. Students(sid: integer, name: string, gpa: real) 5

A Relation is a Table STUDENTS Relation Name Attribute Names SID Name GPA Tuples (Records) 5551234 John Smith 3.4 7771234 Jessica Jones 3.7 The set of the permitted values for an attribute is called the attribute of domain. e.g. domain(sid) = {5551234, 7771234}. 6

A Relation is a Table STUDENTS Relation Name Attribute Names SID Name GPA Tuples (Records) 5551234 John Smith 3.4 7771234 Jessica Jones 3.7 Cardinality = 2 Degree = 3 Note: all rows are distinct! 7

Question? Do all entries in a column, in a relation instance, have to be distinct? NO! Can someone tell me why? NO! Unless we rule the column to be unique columns do not need to be distinct. e.g. multiple students can have the same name or same birth date. 8

Relational Data Model Relation Schema à relation name and attribute list. Optionally: types of attributes. For example: Students (id, name) Students (id: string, name: string) Relation à set of tuples conforming to a schema. e.g. {(5551234, John Smith), (7771234, Jessica Jones) } Database à set of relations. Database Schema à set of all relation schemas in the database. 9

Clarification of Terminology 1. Relation is a Table. 2. Attribute is a column. 3. Tuple is a row. 4. Arity/Degree of a relation is the number of attributes (columns). 5. Cardinality of a relation is the number of tuples (rows). 10

Why Relations? Very simple model. Often matches how we think about data Abstract model that underlies SQL, the most important database language! 11

Relations are Unordered The order of tuples is irrelevant. Tuples may be stored in any arbitrary order. e.g. instructor relation with unordered tuples. ID name dept_name salary 12345 Einstein Physics 95000 45411 Turing Comp Sci 72500 35521 Mozart Music 45000 98002 Dexter Biology 67200 45556 Aristotle Physics 86000 72331 Khan History 55500 42399 Taylor Mathematics 78800 53440 Bond Chemistry 110000 12

Database Information about enterprise is broken up into parts: instructor student advisor The need for NULL values. e.g. represent a student with no instructor Bad design: univ (instructor_id, dept_name, salary, student_id, ) Normalization Theory deals with how to design good relational schemas. Repetition of information is bad! e.g. two students have the same instructor 13

Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible for efficient evaluation. The key: precise semantics for relational queries. Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change. 14

The SQL Query Language Developed by IBM (system R) in the 1970s. There was a need for a standard since it was used by many vendors. Standards: SQL 86 SQL 89 (minor revisions) SQL 92 (major revisions) SQL 99 (major extensions) SQL-2011 (major extensions, specifically in temporal databases) SQL:2016 (8 th revision of the standard, a lot of JSON functionality) 15

Temporal Database Efficiently and effectively store data relating to instances of time. Timestamps are stored in RDBMSs, but are inefficient. Two major methods (attributes) of tracking time: 1. Transaction Time 2. Valid Time 16

Temporal Database 1. Transaction Time The time period during which a row is committed to or recorded in the database. 2. Valid Time The time period during which a row is regarded as correctly reflecting reality by the user of the database. The combination of the attributes Transaction Time and Valid Time forms Bitemporal Time 17

Transaction Time Example ID name dept_name transaction_from transaction_to 12345 Einstein Physics January 2000 45411 Turing Comp Sci September 1986 August 2000 35521 Mozart Music October 1999 98002 Dexter Biology May 2018 45556 Aristotle Physics January 2018 18

Valid Time Example ID name dept_name valid-to 12345 Einstein Physics 45411 Turing Comp Sci August 2000 35521 Mozart Music 98002 Dexter Biology August 2018 45556 Aristotle Physics December 2018 19

Intermission Let s Take a Little Break! ENJOY YOUR COFFEE 20

Database Schemas in SQL SQL is primarily a query language. Used for retrieving information from a database. SQL also includes data-definition component for describing schemas. 21

Levels of Abstraction Many views, single conceptual (logical) schema and physical schema. Views describe how users see the data. Conceptual schema defines logical structure. Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema v Schemas are defined using DDL (data definition language); v Data is modified/queried using DML (data manipulation language). DB 22

Example: University Database Conceptual Data: Students(sid:string, name:string, login:string, age:integer, gpa:real) Courses(cid:string, cname:string, credits:integer) Enrolled (sid:string, cid:string, grade:string) Physical Schema: Relations stored as unordered files. Index on first column of Students. External Schema (i.e. View): Course_info(cid:string, enrollment:integer) 23

Integrity Constraints An integrity constraint is a property that must be satisfied by all meaningful database instances. A constraint can be seen as a predicate. A database is legal if it satisfies all integrity constraints. Types of constraints: 1. Intra-relational constraints e.g. domain constraints and tuple constraints. 2. Inter-relational constraints The most common is referential constraint. 24

Rationale For Integrity Constraints Useful for describing the application in greater detail. Contributes to data quality (avoids data entry errors). An element in the design process; normal forms (discussed later). Used by the system in choosing a strategy for query processing Note: It is not the case that all desirable properties of the data in a database can be described by means of integrity constraints. e.g. data in the relation Instructor must be correct 25

Tuple and Domain Constraints A tuple constraint expresses conditions on the values of each tuple, independently of other tuples. e.g. (GPA > 3.0) or Net = Amount Deduction A domain constraint is a tuple constraint that involves a single attribute. e.g.. (GPA 4.0) AND (GPA 0.0) 26

Unique Values For Tuples RegNum Surname FirstName BirthDate Program 22335 Jones Jenny 04/12/89 History 22567 Jones Robby 17/02/91 Engineering 43754 Jones Joe 28/03/91 Computing 38267 Smith Melissa 13/08/90 Computing 17382 Smith Melissa 08/09/90 English Registration number identifies students i.e. there is no pair of tuples with the same value for RegNum. Personal data could identify students as well i.e. there is no pair of tuples with the same values for all of Surname, FirstName, BirthDate. 27

Keys A key is a set of attributes that uniquely identifies tuples in a relation. More Precisely: A set of attributes K is a superkey for a relation R if R cannot contain two distinct tuples t 1 and t 2 s.t. t 1 [K]= t 2 [K]; K is a candidate key for R is K is a minimal superkey. There exists no other superkey K of R that is contained in K as a proper subset. i.e. K K 28

Example RegNum Surname FirstName BirthDate Program 22335 Jones Jenny 04/12/89 History 22567 Jones Robby 17/02/91 Engineering 43754 Jones Joe 28/03/91 Computing 38267 Smith Melissa 13/08/90 Computing 17382 Smith Melissa 08/09/90 English RegNum is a key! i.e. RegNum is a superkey and contains a sole attribute, thus, it is minimal. {Surname, FirstName, BirthDate} is another key! 29

Question? RegNum Surname FirstName BirthDate Program 22335 Jones Jenny 04/12/89 History 22567 Jones Robby 17/02/91 Engineering 43754 Jones Joe 28/03/91 Computing 38267 Smith Melissa 13/08/90 Computing 17382 Smith Melissa 08/09/90 English Can {Surname, Program} be a key? NO! It may not be apparent with this example, however, there could be combinations of Surnames and Programs that are the same. i.e. You could potentially have students with the same Surname in the same Program. 30

Another Question? Foreshadowing! RegNum Surname FirstName BirthDate Program 22335 Jones Jenny 04/12/89 History 22567 Jones Robby 17/02/91 Engineering 43754 Jones Joe 28/03/91 Computing 38267 Smith Melissa 13/08/90 Computing 17382 Smith Melissa 08/09/90 English Any ideas on how we would address this issue? i.e. How do we ensure that only a certain set of keys can be used to obtain the data sets that we want? Normalization and Database Design (we will discuss in future!) 31

Existence of Keys Relations are sets; each relation is composed of distinct tuples. The whole set of attributes for a relation defines a superkey. Each relation has a key, which is a set of all its attributes, or a subset thereof. 32

Existence of Keys Keys guarantee that each piece of data in the database can be accessed. Keys are a major feature of the Relational Model Allow us to say that it is value-based. 33

Keys and Null Values The presence of NULL values negatively impacts keys: unique identification is no longer guaranteed. no assistance in establishing correspondence between data in different relations. 34

Keys and Null Values RegNum Surname FirstName BirthDate Program NULL Jones Robby NULL Computing 43754 Jones Melissa 28/03/91 Engineering 38267 Smith Melissa NULL NULL NULL Smith Melissa 08/09/90 Engineering Questions to think about? 1. How do we access the 1 st tuple? 2. Are the 3 rd and 4 th tuples the same? 35

Keys and Null Values The presence of NULL values negatively impacts keys: unique identification is no longer guaranteed. Maps to Q1 Maps to Q2 no assistance in establishing correspondence between data in different relations. 36

Primary Keys In general, the presence of NULL values in key has to be limited. In most cases (and preferably) non-existent! Each relations must have a primary key, where NULL values are not allowed to occur in any attribute. References between relations are realized through primary keys. Notation: The attribute of the primary key are underlined! 37

Primary Keys Notation: The attribute of the primary key are underlined! RegNum Surname FirstName BirthDate Program 22567 Jones Robby NULL Computing 43754 Jones Melissa 28/03/91 Engineering 38267 Smith Melissa NULL NULL 17383 Smith Melissa 08/09/90 Engineering 38

DO We Always Have Primary Keys? Ordinarily, we have reasonable primary keys. e.g. Client/Account Number, Student Number, SIN, etc... There potentially could be multiple keys, however, one will be designated as the primary. 39

Recap A set of fields is a key for a relation if: 1. No two distinct tuples can have the same value in all key fields, and 2. This is not true for any subset of the key. If false, then a superkey. If there are more than one key for a relation, one is selected to be the primary key. STUDENTS SID Name GPA 5551234 John Smith 3.4 7771234 Jessica Jones 3.7 e.g. SID is a key for Students. The tuple {SID, GPA} is a superkey. 40

Primary and Candidate Keys For a given student and course, there is a single grade. vs. Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade. Be careful to define Integrity Constraints (ICs) correctly at design time. ICs are checked when data is updated. Enrolled(SID, CID, grade) Enrolled(SID, CID, grade) Enrolled(SID, CID, grade) Key {CID, grade} 41

Primary and Candidate Keys Be Careful! If an IC is used carelessly it can prevent the storage of database instances that arise in practice! 42

Foreign Keys Pieces of data in different relations are correlated by means of values of primary keys. Referential integrity constraints are imposed in order to guarantee that the values refer to existing tuples in the referenced relation. A foreign key requires that the values on a set X of attributes of a relation R 1 must appear as values for the primary key of another relation R 2. i.e. set of attributes in one relation that is used to refer to a tuple in another relation. These must correspond to primary key of the second relation! Like a logical pointer! 43

Referential Integrity e.g. SID is a foreign key referring to Students: Enrolled(SID: string, CID: string, grade: string) If all foreign key constraints are enforced, referential integrity is achieved. i.e. No dangling references exist! 44

Referential Integrity Only students listed in the Studentrelation should be allowed to enroll for courses. Enrolled SID CID Grade 22555 Biology101 A 23555 History407 B 23455 English201 A 22555 Genetics304 C Students SID LastName Login Age GPA 22555 White white@sci 21 3.4 23555 Jones jones@his 20 3.2 23455 Taylor taylor@eng 20 3.6 45

Enforcing Referential Integrity Consider the relations: Students and Enrolled SID in Enrolled is a foreign key that references Students. QUESTION?! What should be done if an Enrolled tuple with a non-existent SID is inserted? REJECT IT! 46

Enforcing Referential Integrity ANOTHER QUESTION?! If a Students tuple is deleted, what should occur? 1. Delete all Enrolled tuples that reference it. 2. Prevent deletion of a Students tuple that it is being referenced. 3. Set SID in Enrolled tuples that refer to it to a default SID. 4. Set SID in Enrolled tuples that refer to it to NULL. Similar if the primary key of Students tuple is updated. This is called Cascading 47

Integrity Constraints (a.k.a. ICs) ICs are based upon the semantics of the real-world enterprise. This is being described in the database relations. We CAN check a database instance to see if an IC is violated. HOWEVER We CANNOT infer that an IC is true by looking at an instance. An IC is a statement about all possible instances. Key and Foreign Key ICs are the most common; more general ICs are also supported. RECALL Semantics is the meaning or relationship of meaning of a sign or set of signs. 48

Where do ICs Come From? Let s work through an example! Information: 3 relations: Offences, Officers, and Cars. 1. Offences: {Code, Date, Officer, Dept, Registration} 2. Officers: {RegNum, Surname, FirstName} 3. Cars: {Registration, Dept, Owner} Now let s add some data! 49

Offences Code Date Officer Dept Registration 83746 25/05/2016 567 45 416 YYZ 29374 26/05/2016 984 45 747 LHR 39374 28/05/2016 984 45 416 YYZ 27483 28/05/2016 567 45 747 LHR 95848 28/05/2016 363 82 333 VCE Officers RegNum Surname FirstName 567 Hopkins John 984 Walker Johnny 363 Ross George Cars Registration Dept Owner 416 YYZ 45 Michael Liut 747 LHR 45 Jessica Jones 333 VCE 82 Giovanni Saputo 647 YUL 45 Michael Liut 50

Offences Code Date Officer Dept Registration 83746 25/05/2016 567 45 416 YYZ 29374 26/05/2016 984 45 747 LHR 39374 28/05/2016 984 45 416 YYZ 27483 28/05/2016 567 45 747 LHR 95848 28/05/2016 363 82 333 VCE Officers Cars RegNum Surname FirstName 567 Hopkins John 984 Walker Johnny 363 Ross George Registration Dept Owner 416 YYZ 45 Michael Liut 747 LHR 45 Jessica Jones 333 VCE 82 Giovanni Saputo 647 YUL 45 Michael Liut NOTE: Offences[Officer] Officers[RegNum] Offences[Registration, Dept] Cars[Registration,Dept] 51

NOTE: Offences[Officer] Officers[RegNum] Offences[Registration, Dept] Cars[Registration,Dept] THIS IS KEY!! IF YOU DON T UNDERSTAND TELL ME NOW! 52

Violation of Foreign Keys Offences Code Date Officer Dept Registration 83746 25/05/2016 912 37 416 YYZ 29374 26/05/2016 912 45 747 LHR Officers RegNum Surname FirstName 567 Hopkins John 984 Walker Johnny Cars Registration Dept Owner 416 YYZ 45 Michael Liut 747 LHR 45 Jessica Jones 333 VCE 82 Giovanni Saputo 53

Referential Constraints General comments that should be noted: Referential Constraints play an important role in making the Relational Model value-based. Care is needed in case of referential constraints that involve two or more attributes. 54

Examples Accidents Cars Code Dept1 Registration1 Dept2 Registration2 6002 45 416 YYZ 82 333 VCE 6437 45 747 LHR 82 333 VCE Registration Dept Owner 416 YYZ 45 Michael Liut 747 LHR 45 Jessica Jones 333 VCE 82 Giovanni Saputo 647 YUL 45 Michael Liut Here we have two Referential Constraints for Accidents: {Registration1, Dept1} to Cars and {Registration2, Dept2} to Cars. 55

Questions? Q & A THANKS FOR LISTENING I LL BE ANSWERING QUESTIONS NOW 56

Citations, Images and Resources Database Management Systems (3 rd Ed.), Ramakrishnan & Gehrke Some content is based off the slides of Dr. Fei Chiang - http://www.cas.mcmaster.ca/~fchiang/ https://sigmodrecord.org/publications/sigmodrecord/1209/pdfs/07.industry.kulkarni.pdf http://www.clipartpanda.com/clipart_images/key-clip-art-159-3522839 https://lh3.ggpht.com/edtqa1vcezlabp2d9ixsepgy2qkamjeener8tjrhxmda443klmldzfmojqtvy8jrreo=w300 https://s3.amazonaws.com/images.kateheddleston.com/galleries/d6970e47-10f6-40a3-8745- fca33c1fc217/1428388432254_null_pointer_2_large http://www.netanimations.net/animated-dancing-red-question-mark-picture-moving.gif 57