L3 The Relational Model. Eugene Wu Fall 2018

Similar documents
Database Applications (15-415)

The Relational Model. Chapter 3

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

The Relational Model 2. Week 3

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

6.830 Lecture 2 9/11/2017 Relational Data Model (PS1 Out)

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Relational data model

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

The Relational Model. Week 2

The Relational Data Model. Data Model

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

The Relational Model of Data (ii)

The Relational Model

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

Database Systems ( 資料庫系統 )

Each animal in ONE cage, multiple animals can share a cage Each animal cared for by ONE keeper, a keeper cares for multiple animals

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

The Relational Model

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Database Management Systems. Chapter 3 Part 1

CIS 330: Applied Database Systems

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

Database Systems. Course Administration

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

The Relational Model (ii)

8) A top-to-bottom relationship among the items in a database is established by a

DBMS. Relational Model. Module Title?

CMPT 354: Database System I. Lecture 3. SQL Basics

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Keys, SQL, and Views CMPSCI 645

EGCI 321: Database Systems. Dr. Tanasanee Phienthrakul

Announcements If you are enrolled to the class, but have not received the from Piazza, please send me an . Recap: Lecture 1.

Introduction to Data Management. Lecture #3 (E-R Design, Cont d.)

The Entity-Relationship Model. Overview of Database Design

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Introduction to Data Management. Lecture #5 (E-R Relational, Cont.)

SQL: The Query Language Part 1. Relational Query Languages

Comp 5311 Database Management Systems. 4b. Structured Query Language 3

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

Review: Where have we been?

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Introduction to Data Management. Lecture #4 E-R Model, Still Going

CS 2451 Database Systems: Relational Data Model

CSEN 501 CSEN501 - Databases I

The Relational Model. Database Management Systems

Modern Database Systems Lecture 1

LAB 3 Notes. Codd proposed the relational model in 70 Main advantage of Relational Model : Simple representation (relationstables(row,

The Entity-Relationship Model

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Lecture 16. The Relational Model

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Database Management Systems. Chapter 3 Part 2

Introduction to Database Systems

Introduction to Data Management. Lecture 16 (SQL: There s STILL More...) It s time for another installment of...

Database Applications (15-415)

Basant Group of Institution

High-Level Database Models (ii)

Mahathma Gandhi University

Introduction to Data Management. Lecture #6 E-Rà Relational Mapping (Cont.)

OVERVIEW OF DATABASE DEVELOPMENT

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

High Level Database Models

Overview of db design Requirement analysis Data to be stored Applications to be built Operations (most frequent) subject to performance requirement

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

The Relational Model

Relational Model. CSC 343 Winter 2018 MICHAEL LIUT

SQL. Chapter 5 FROM WHERE

CS143: Relational Model

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

Database Management Systems. Session 2. Instructor: Vinnie Costa

e e Conceptual design begins with the collection of requirements and results needed from the database (ER Diag.)

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

CSE 530A. ER Model to Relational Schema. Washington University Fall 2013

Introduction to Data Management. Lecture #3 (Conceptual DB Design) Instructor: Chen Li

SQL DDL. CS3 Database Systems Weeks 4-5 SQL DDL Database design. Key Constraints. Inclusion Constraints

Informatics 1: Data & Analysis

CS 377 Database Systems

Introduction to Data Management. Lecture #17 (SQL, the Never Ending Story )

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Introduction to Database Design

Transcription:

L3 The Relational Model Eugene Wu Fall 2018

Background Most widely used data model in the world Legacy Models IMS hierarchical CODASYL network Recently popular: NoSQL attempt at a flexible model

Key Principles Data redundancy (or how to avoid it) Physical data independence programs don t worry about physical structure Logical data independence logical structure can change and legacy programs can continue to run High level languages

Historical Context (not on test) Hierarchical model (IMS) Network model (CODASYL) Relational model (SQL/QUEL) 70s 80-90s Animals(name, species, age, feedtime) lives_in cared_by Cages(no, size, loc) Keepers(name)

Hierarchical Model (IMS, circa 1968) Segment types (objects) Segment instances (records) Segment types form a tree Keepers Animals Cages

Data Stored Depth First Jane (Keeper) (HSK 1) Bob, iguana, (2) 1, 100ft 2, (3) Joe, student, (4) 1, 100ft 2, (5) Keepers Animals Cages What s repeated? Inconsistencies possible, lack of protections

Can t Avoid Redundancy Segment types (objects) Segment instances (records) Segment types form a tree Keepers Cages Keepers Animals Animals Animals Cages Repeats cage data if >1 animals in a cage Keepers Repeats keeper data if keeper cares for >1 animals Cages DAG!= Tree

Physical Storage Stored hierarchically Only root segment can be indexed Other segments only accessed sequentially Keepers Segment K Can be indexed Sequential, hash, tree Animals Segments Cages Segments A1 C1 A2 C2 Sequential access only

Hierarchical Querying: DL-1 Navigational Querying through a tree structure Core operations GX(seg, pred) general form, takes seg type and a predicate Get Unique (GU) start at parent (root) segment Get Next (GN) next record in HSK order in database Get Next in Parent (GNP) next in HSK order until end of subtree Fetch cages that Eugene entered GU(Keeper, name = Eugene) Until no more records cage = GNP(Cage) print cage.no Find Keepers of Cage 6 keeper = GU(Keeper) GNP(Cages, no=6) print keeper Until no more records keeper = GN(Keeper) GNP(Cages, no=6) print keeper

Problems Duplicates data Low level programming interface Almost no physical data independence Change root from tree to hash index causes programs with GN on root to fail Inserts into sequential root structures disallowed Lacks logical data independence Changing schema requires changing program Violates many desirable properties of a proper DBMS

More Problems Schema changes require program changes because pointers after GN calls now different In reality, schemas change all the time Keepers now responsible for a whole cage Hummingbirds require multiple feedings Merge with another zoo

Network Models (CODASYL, 1969) Abstraction Types of Records Connected by named sets (one to many relationships) Modeled as a graph Keepers Animals Cages Bob, iguana Joe, student Snoopy, dog Jane Charlie 1, 100ft 2 2, 150ft 2

Network Models: Queries Queries are programs that follow pointers (IMS style) Find Keeper (name = Eugene ) until no more Find next Animal in cares_for Find Cage in lives_in Get current record Very Smart people (Charles Bachman, 73 Turing Award) strongly defended this model but

Network Models: Problems Very complex due to navigational programming (not for mere mortals!) Still no physical nor logical data independence Implementations were limiting must load all data at once Trades off increased programmer pain for modeling non-hierarchical data

Relational Model (1970) Ted Codd, 1970 Reaction to CODASYL Key properties 1. simple representation 2. set oriented model 3. no physical data model needed

Roadmap History lesson DDLs: Data definition language Integrity Constraints DMLs: Data Manipulation Language Selection Queries ER à Relational Model

Administrivia AA1 out: implement OFFSET in DataBass 3.75% of class grade in extra credit Come by OH for Databass crash course

Basic Definitions Database a set of relations Instance set of instances for relations in database Relation 2 parts Instance a table with rows and columns Schema name of relation + name & type of each column e.g., Students(sid: int, name: string, login: string, age: int) Think of relation as a set (no duplicate rows) Relation colored glasses Everything (data, relationships, query results) is a relation

Terminology Formal Name Relation Tuple Attribute Domain Synonyms Table Row, Record Column, Field Type

Example Instance of Students Relation sid name login age gpa 1 eugene ewu@cs 20 2.5 2 luis luis@cs 20 3.5 3 ken ken@math 33 3.9 Cardinality 3 Degree 5 Do the values of different rows have to be distinct? Do the values of columns have to be distinct?

DDL: CREATE TABLE Create the Students Relation Note: attribute domains are defined & enforced by DBMS CREATE TABLE Students( sid int, name text, login text, age int, gpa float ) Create the Enrolled relation CREATE TABLE Enrolled( sid int, cid int, grade char(2) )

Integrity Constraints (ICs) def: a condition that is true for any instance of the database Often specified when defining schema DBMS enforces ICs at all times An instance of a relation is legal if it satisfies all declared ICs Programmer doesn t have to worry about data errors! e.g., data entry errors PostgreSQL documentation great resource www.postgresql.org/docs/8.1/static/ddl-constraints.html

Domain Constraints (attr types) CREATE TABLE Students( sid int, name text, login text, age int, gpa float )

NULL Constraints CREATE TABLE Students( sid int NOT NULL, name text, login text, age int, gpa float )

Candidate Keys Set of fields is a candidate key (or just Key) for a relation if: 1. No two distinct tuples have same values in all key fields 2. This is untrue for any subset of the key 3. Key fields cannot be NULL If (2) is false, called a superkey what s a trivial superkey? If >1 candidate keys in relation, DBA picks one as primary key sid is key for Students is name a key? what is (sid, gpa)?

Primary and Candidate Keys UNIQUE & PRIMARY KEY key words Be careful with ICs: Each student can enroll in a course only once What does this say? CREATE TABLE Enrolled( sid: int, cid: int, grade: char(2), PRIMARY KEY (sid, cid) ) CREATE TABLE Enrolled( sid: int, cid: int, grade: char(2), PRIMARY KEY (sid), UNIQUE (cid, grade) )

Primary and Candidate Keys UNIQUE + NOT NULL == field is a key PRIMARY KEY is shorthand for UNIQUE NOT NULL CREATE TABLE Students( sid int NOT NULL UNIQUE, name text, login text, age int, gpa float, PRIMARY KEY (sid) )

Foreign Keys def: set of fields in Relation R i used to refer to tuple in R j via R j s primary key (logical pointer) Enrolled CREATE TABLE Enrolled( sid: int, cid: int, grade: char(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES Students ) Students sid cid grade 1 2 A 1 3 B 2 2 A+ sid name 1 eugene 2 luis

Referential Integrity A database instance has referential integrity if all foreign key constraints are enforced no dangling references Examples where referential integrity is not enforced HTML links Yellow page listing Restaurant menus

CHECK Constraints Boolean constraint expression added to schema Very powerful mechanism. More specific constraints in next slides CREATE TABLE Enrolled( sid int, cid int, grade char(2), CHECK ( grade = A or grade = B or grade = C or grade = D or grade = F ) )

How to Enforce Integrity Constraints Run checks anytime database changes On INSERT what if new Enrolled tuple refers to non-existent student? Reject insertion On DELETE (many options) what if Students tuple is deleted? delete dependent Enrolled tuples reject deletion set Enrolled.sid to default value or null (null means unknown or inapplicable in SQL)

Where do ICs come from? Based on application semantics and use cases Can check if database instance satisfies ICs IC is statement about the world (all possible instances) Can t infer ICs by staring at an instance Key and foreign key ICs are most common, more general table and database constraints possible as well.

More Powerful than ER Constraints Functional dependencies A dept can t order two distinct parts from the same supplier. Can t express this wrt ternary Contracts relationship. Normalization refines ER design by considering FDs. Inclusion dependencies Special case: ER model can express Foreign keys At least 1 person must report to each manager. General constraints Each donation is less than 10% of the combined donations to all humanities courses.

What Can ER Express? Key constraints, participation constraints, overlap/covering constraints Some foreign key constraints as part of relationship set Some constraints require general CHECK stmts ER cannot express e.g., function dependencies at all Constraints help determine best database design

DML: Introduction to Queries Key strength of relational model declarative querying of data Queries are high level, readable DBMS makes it fast, user don t need to worry Precise semantics for relational queries Lets DBMS choose different ways to run query while ensuring answer is the same

INSERT/DELETE Add a tuple INSERT INTO Students(sid, name, login, age, gpa) VALUES (4, wu, wu@cs, 20, 5) Delete tuples satisfying a predicate (condition) DELETE FROM Students S WHERE S.name = wu

Basic SELECT Get all attributes of <21 year old students Get only names SELECT * FROM Students S WHERE S.age < 21 SELECT S.name FROM Students S WHERE S.age < 21 sid name login age gpa 1 eugene ewu@cs 20 2.5 2 luis luis@cs 20 3.5 3 ken ken@math 33 3.9

Multi-table SELECT What does this return? Enrolled sid cid grade 1 2 A 1 3 B 2 2 A+ SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid = E.sid AND E.grade = A Students sid name 1 eugene 2 luis Result name eugene 2 cid

Single Table Semantics A conceptual evaluation method for previous query: 1. FROM clause: retrieve Students relation 2. WHERE clause: Check conditions, discard tuples that fail 3. SELECT clause: Delete unwanted fields Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers.

Multi-Table Semantics Modify the FROM clause evaluation 1. FROM clause: compute cross-product of Students and Enrolled Enrolled sid cid grade 1 2 A 1 3 B 2 2 A+ Students sid name 1 eugene 2 luis Cross-product sid cid grade sid name 1 2 A 1 eugene 1 3 B 1 eugene 2 2 A+ 1 eugene 1 2 A 2 luis 1 3 B 2 luis 2 2 A+ 2 luis

Multi-Table Semantics Modify the FROM clause evaluation 1. FROM clause: compute cross-product of Students and Enrolled 2. WHERE clause: Check conditions, discard tuples that fail 3. SELECT clause: Delete unwanted fields SELECT a1, a2, a3 FROM A, B, C WHERE A.a1 = B.b1 for row1 in A for row2 in B for row3 in C if (row1.a1 == row2.b1) { yield [row1.a1, row1.a2, row1.a3] }

Administrivia Project 1 Part 1 Meetings this week and next Who hasn t signed up yet? Extra OH after class today to go over DataBass Set up environment before coming! (See DataBass readme)

Translating ER à Relational Models

Entity Set à Relation cid name Course loc CREATE TABLE Course( cid int, name text, loc text, PRIMARY KEY (cid) )

Relationship Set w/out constraint à Relation Relation must include Keys for each entity set as foreign keys these form superkey for relation All attributes of the relationship set since Users Takes Courses CREATE TABLE Takes( uid int, cid int, since date, PRIMARY KEY (uid, cid), FOREIGN KEY (uid) REFERENCES Users, FOREIGN KEY (cid) REFERENCES Courses )

Key Constraint à Relation Note only cid is a Key Why don t we have UNIQUE(uid, cid)? User and Courses are separate relations Users Instructs Courses CREATE TABLE Instructs( uid int NOT NULL, cid int, PRIMARY KEY (cid), FOREIGN KEY (uid) REFERENCES Users, FOREIGN KEY (cid) REFERENCES Courses )

Key Constraint à Relation Alternatively combine Courses and Users (this is the preferred way) Users Instructs Courses CREATE TABLE Course_Instructs( cid int uid int, name text, loc text, PRIMARY KEY (cid), FOREIGN KEY (uid) REFERENCES Users )

Key Constraint à Relation How to translate this ER diagram? Users Instructs CREATE TABLE Course_Instructs(???? ) Courses

Key Constraint à Relation How to translate this ER diagram? Users Instructs Courses CREATE TABLE Course_Instructs_Users( cid int UNIQUE, uid int UNIQUE, name text, loc text, username text, age text, PRIMARY KEY (cid, uid) )

Participation Constraint à Relation Only participation constraints with one entity set in binary relationship (others need CHECK constraint) Users Instructs Courses CREATE TABLE Course_Instructs( cid int uid int NOT NULL, name text, loc text, PRIMARY KEY (cid), FOREIGN KEY (uid) REFERENCES Users ON DELETE NO ACTION )

Weak Entity à Relation Weak entity set and identifying relationship set are translated into a single table. When the owner entity is deleted, all owned weak entities must also be deleted. time Users Posted WallPosts CREATE TABLE Wall_Posted( pid int post_text text, posted_at DATE, uid int NOT NULL, PRIMARY KEY (pid, uid), FOREIGN KEY (uid) REFERENCES Users ON DELETE CASCADE )

ISA Hierarchies Option 1: Keep base relation Instructors & Students recorded in Users Extra info in Instructors or Students relation JOIN between child and base relations for all attributes Option 2: Only keep child relations Instructors copies attributes from Users Instructors(uid, name, age,, rating) Only if covering constraint = yes Users (B) ISA Rating Instructors (A) Students (A) Grade

Aggregation Instructors Since Manages Companies Donates Courses Amount

Aggregation Instructors Since Manages Donates Company_id Amount Course_id

in-class demo on instabase https://www.instabase.com/user/w4111- nb/notebooks/ewu/w4111- public/fs/instabase%20drive/examples/lec5.ipynb note: the database doesn t support BEGIN transactions, so the cyclic references example doesn t run.

REVIEW OF ER AND RELATIONAL

ER Overview Ternary relationships Relationships constraints At most one At least one Exactly one Weak entities Aggregation Is-A

Relational Overview Relations ~= Sets Schema = structure Instance = the data

Relational Review Relations ~= Sets Schema = structure Instance = the data "every relation is guaranteed to have a key? Integrity Constraints Candidate and primary keys NOT NULL Referential integrity How are foreign keys managed by the DBMS?

Star Schema Dim 1 Fact Table Dim 4 Dim 2 Dim 5 Dim 3 Dim 6

Star Schema Date_id Weather Time Store_id Name Owner Emp_id Name Age Date_id Store_id Emp_id Prod_id Sales_id Login_id Prod_id Name price Sales_id Qty CC Login_id Time User

So What Happened? 1970 heated debates about CODASYL vs Relational Network arguments low level languages more efficient (performance) relational queries would never be fast (performance) Relational arguments data independence high level simpler languages Market spoke. Other models beyond relational!

Summary Better than IMS/CODASYL allows us to talk about constraints! allows us to talk at a logical level declarative queries better than navigational programs Everything is a relation (table) DBA specifies ICs based on app, DBMS enforces Primary and Foreign Keys most used Types == Domain constraints SQL

Next Time Relational Algebra A set-oriented theory for relational data Finish history lesson