CS2300: File Structures and Introduction to Database Systems Lecture 9: Relational Model & Relational Algebra Doug McGeehan 1
Brief Review Relational model concepts Informal Terms Formal Terms Table Relation Column Attribute Row Tuple All possible values in a column Domain Table Definition Schema of Relation Populated Table Extension/State 2
Brief Review Relational model concepts Keys and Superkeys e.g. Vehicle(VIN, Reg#, State, MPG, Odometer) Keys: {VIN} and {Reg#, State} Superkeys: {VIN, Odometer} or {Reg#, State, MPG} They both include a key; one attribute can be removed Candidate keys: {VIN}, {Reg#, State} Primary key: {VIN} 3
Brief Review Relational model concepts Keys and Superkeys Relational model constraints Domain / NOT NULL constraints (on attributes) Key constraints (on a single relation) Entity integrity constraint (on a single relation) Referential integrity constraint (on two relations) 4
Referential Integrity A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2. R1 SID Name Address SID Course Grade R2 5
Valid and Invalid State Valid state: A database state satisfies all integrity constraints. Invalid state: A database state that does not obey some integrity constraint(s). R1 Invalid state R2 SID Name Address SID Course Grade 101 Alice Rolla 111 CS238 A 101 CS304 A 6
Valid and Invalid State Valid state: A database state satisfies all integrity constraints. Invalid state: A database state that does not obey some integrity constraint(s). R1 Valid state R2 SID Name Address SID Course Grade 101 Alice Rolla 101 CS238 A 101 CS304 A 7
Displaying a relational database schema and its constraints Relation schema: displayed as a row of attribute names Name of the relation: above the attribute names Primary key attribute(s): underlined Foreign key (referential integrity) constraint: A directed arc (arrow) from the foreign key attributes to the referenced relation For clarity, can also point to the primary key of the referenced relation. 8
How to identify a foreign key? Start from a Candidate Key Check if it is referenced by other relations or even in the same relation R1 R2 SID Name Address SID Course Grade 9
Referencing and referenced relations can be the same 10
Exercise Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(SSN, Course#, Quarter, Grade) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) TEXT(Book_ISBN, Book_Title, Publisher, Author) Draw a relational schema diagram specifying the foreign keys for this schema. 11
Question Car(State, Reg#, SerialNo, Make, Model, Year) Car has two candidate keys: {state, reg#}, {SerialNo} Accident(SerialNo, date) Is SerialNo a foreign key in Car or Accident relation? Answer: SerialNo is a foreign key in Accident relation because it is a candidate key in Car relation. 12
Operations of the Relational Model Operations can be categorized into Retrieval Query a database Updates Change the database Basic update operations for changing the database: INSERT a new tuple in a relation DELETE an existing tuple from a relation MODIFY an attribute of an existing tuple 13
Update Operations on Relations Integrity constraints should not be violated by the update operations. Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints. 14
Example Relations R tid sid course S sid sname year 2 os 240 white 3 10 450 calculus 2 jones 3 2 db 450 adams 1 240 db T tid tname dept 10 cohen math levy cs 15
Example Relations R tid sid course S sid sname year 2 os 240 white 3 10 450 calculus 0 jones 3 2 db 450 adams 1 240 db T tid tname dept 10 cohen math levy cs 16
Example Relations R tid sid course S sid sname year 0 os 240 white 3 10 450 calculus 0 jones 3 0 db 450 adams 1 240 db T tid tname dept 10 cohen math levy cs 17
Possible Violations for Insert Operation INSERT may violate any of the constraints: Domain constraint / NOT NULL constraint: A new attribute in new tuple is not in attribute s domain Key constraint: Key attribute in new tuple already exists in its relation Entity integrity: Primary key value is NULL in the new tuple Referential integrity: Foreign key in new tuple references non-existent primary key 18
Possible Violations for Delete Operation DELETE may violate only referential integrity: If the primary key value of the tuple being deleted is referenced from other tuples in the database Can be remedied by several actions Reject the deletion Propagate the change to the referencing tuples Set the foreign keys of the referencing tuples to NULL when the referencing attributes are not part of the primary key One of the above options must be specified during database design for each foreign key constraint 19
Possible Violations for Modify Operation MODIFY may violate any of the constraints when Updating the primary key (PK) Similar to a DELETE followed by an INSERT Need to specify similar remedies to DELETE Updating a foreign key (FK) May violate referential integrity Updating an ordinary attribute (neither PK nor FK): Can only violate domain constraints or NOT NULL constraint
How to Deal With Violations? In case of integrity violation caused by any operation, several actions can be taken: Reject the operation that causes the violation Correct the violation by triggering additional updates Perform the operation but inform the user of the violation Execute a user-specified error-correction routine 21
Relational Algebra Chapter 8 22
What is an Algebra? Mathematical system consisting of: Operands : variables or values from which new values can be constructed. Operators : symbols denoting procedures that construct new values from given values. 23
What is Relational Algebra? The relational model is an abstract (mathematical) modeling of a table Relational algebra (RA): an algebra whose operands are relations or variables that represent relations. A (mathematical) query language for relations Query language = languages for writing questions about the data 24
Why do we need to understand RA? Real queries are written in SQL, but are translated by the query processor into relational algebra Why? SQL is declarative, RA provides operations for execution Optimization is easier in RA, since we can take advantage of (provable) expression equivalences 25
What you should know from this course 1. How to write queries in relational algebra 2. How to calculate the result of a relational algebra expression over a set of relations 3. How to determine whether two relational algebra expressions are equivalent 26
Relational Algebra Relational algebra has a collection of operators on relations Operators may be unary or binary The output of an operator is a relation Another way to say this is that the algebra is closed Therefore, operators can be composed one on another 27
Basic Operators Relational Algebra has 5 basic operators: Project (π) Select ( ) Union (U) Set difference (-) Cartesian product (X) Other operators can be defined using the basic ones: Intersection, Join, Division A useful syntactic operator: Rename 28
Example Relations R tid sid course S sid sname year 2 os 240 white 3 10 450 calculus 2 jones 3 2 db 450 adams 1 240 db T tid tname dept S = Students T = Teachers R = Studies 10 cohen levy math cs 29
The Project Operation The Project operation is unary (i.e., it is applied to a single relation) Denoted as: A1,,An (R) A 1,,A n are attributes Returns a new relation of only A 1,,A n from the original relation Output relation does not have a name 30
Example: Find the teacher s ID (tid) for each course R tid,course R tid sid course tid course 2 os os 10 450 calculus 10 calculus 2 db db 240 db Less tuples in result. Why? 31
The Project Operation Duplicate elimination The Project operation removes any duplicate tuples, so the output is a valid relation When computing a projection on a relation with n tuples: What is the minimum cardinality of the result? What is the maximum cardinality? 32
The Project Operation list1 ( list2 R) = list1 R Note: list2 must contain attributes in list1 i.e. Intersection must not be empty Otherwise, the left-hand side is incorrect. 33
The Select Operation Unary operator, written as C (R) C is a Boolean condition over each single tuple in R e.g. name = Doug (Instructors) Returns the tuples that satisfy C 34
Example Return the courses taught by teacher number R tid sid course 2 os 10 450 calculus 2 db 240 db tid = R tid sid course 2 os 2 db 240 db 35
What types of Conditions can be used? The condition is made up of comparisons that are connected using logical operators (and, or) Comparisons are between 2 attributes or between an attribute and a constant attribute1 op attribute2 (e.g. balance<credit_limit) attribute1 op constant (e.g. tid=) Important! Conditions are evaluated a single tuple at a time. 36
Types of Comparisons We can use any of the operators:,,,, =, When comparing an attribute with a string, the string is written in single quotes tname = ' cohen ' ( T ) 37
The Select Operation Examples tid = and course = `os R tid sid R course 2 os 10 450 calculus What are the results of the above queries? 2 240 db db 38
The Select Operation Examples tid = and course = `os R tid = 10 and course = `db R tid sid 2 R course os 10 450 calculus What are the results of the above queries? 2 240 db db 39
The Select Operation Examples tid = and course = `os R tid = 10 and course = `db R tid = or sid=2 or sid=450 R tid sid 2 R course os 10 450 calculus What are the results of the above queries? 2 240 db db 40
The Select Operation c1 ( c2 (R)) = c2 ( c1 (R)) = c1 and c2 (R) When computing a selection on a relation with n tuples, Minimum cardinality of result? Maximum cardinality of result? 41
Combining Selection and Projection What does this compute? tid sid course course ( tid = R) 10 2 450 2 240 os calculus db db Can we change the order of these two operations? 42
Example How would you find the names of the third year students? S sid sname year? 240 white 3 sname 2 jones 3 white 450 adams 1 Jones 43
Example Find id of students named jones who are in their first year or third year of studies ( sid sname = ' jones' ( year= 1 year= 3) ( S)) S sid sname year 240 white 3 sid 2 450 jones adams 3 1 2 44
Once Again What is in the result of the query when we have this instance of S? ( sid sname = ' jones' ( year= 1 year= 3) ( S)) S sid 240 2 450 sname white jones jones year 3 3 1 sid 2 450 45
What s Next More relational algebra operators Project Phase 1 report due Monday 11:59pm 46