Database Management Systems Chapter 3 Part 1 The Relational Model Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Simple data representation and it can express complex queries. Recent competitor: object-oriented model ObjectStore, Versant, Ontos A synthesis emerging: object-relational model Informix Universal Server, UniSQL, O2, Oracle, DB2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Relational Database: Definitions Relational database: a set of relations with distinct names. Relation: made up of 2 parts: Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree. Schema : specifies name of relation, plus name and domain/type of each column. E.G. Students(sid: string, name: string, login: string, age: integer, gpa: real). Can think of a relation as a set of rows or tuples (i.e., all rows are distinct). Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Example Instance of Students Relation sid name login dob gpa 53666 Jones jones@cs 2000-09-01 3.4 53688 Smith smith@eecs 2001-08-01 3.2 53650 Smith smith@math 2000-10-10 3.8 Cardinality = 3, degree = 5, all rows distinct The order of fields does not matter if the fields are named. A relation is a set of rows, so the order of rows is not important. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible for efficient evaluation. The key: precise semantics for relational queries. Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 The SQL Query Language Developed by IBM (system R) in the 1970s Need for a standard since it is used by many vendors Standards: SQL-86 SQL-89 (minor revision) SQL-92 (major revision) SQL-99 (major extensions, current standard) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
Creating Relations in SQL Creates the Students relation. Observe that the type (domain) of each field is specified, and enforced by the DBMS whenever tuples are added or modified. CREATE TABLE Students (sid CHAR(20), name CHAR(30), login CHAR(20), DOB DATE, gpa REAL); string type with different length Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 Creating Relations in SQL As another example, the Course table holds information about courses that students take. CREATE TABLE Course (cid CHAR(20), cname CHAR (20), description CHAR( 100) DEFAULT Course Description ); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
Creating Relations in SQL As another example, the Enrolled table holds information about courses that students take. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(10)); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Adding Tuples We can insert a single tuple into the Students table as follows: INSERT INTO Students (sid, name, login, DOB, gpa) VALUES ( 53666, Jones, jones@cs, 2000-09-01, 3.4); sid name login DOB gpa 53666 Jones jones@cs 2000-09-01 3.4 53688 Smith smith@eecs 2001-08-01 3.2 53650 Smith smith@math 2000-10-10 3.8 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
Deleting Tuples We can delete all tuples satisfying some condition (e.g., name = Smith): DELETE S FROM Students S WHERE S.name = Smith ; sid name login dob gpa 53666 Jones jones@cs 2000-09-01 3.4 53688 Smith smith@eecs 2001-08-01 3.2 53650 Smith smith@math 2000-10-10 3.8 Powerful variants of these commands are available; more later! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Updating Tuples We can modify the column values in an existing row: UPDATE Students S SET S.gpa = S.gpa -1 WHERE S.sid = 53688; Old value sid name login Dob gpa 53666 Jones jones@cs 2000-09-01 3.4 53688 Smith smith@eecs 2001-08-01 3.2 53650 Smith smith@math 2000-10-10 3.8 2.2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Updating Tuples Where clause is applied first and determines which rows to be modified. UPDATE Students S SET S.gpa = S.gpa - 0.1 WHERE S.gpa >= 3.3; sid nam e login age gpa 53666 Jones jones@ cs 18 3.4 53688 Sm ith sm ith@ eecs 18 3.2 53650 Sm ith sm ith@ m ath 19 3.8 3.3 3.7 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Integrity Constraints (ICs) An IC is a condition specified on a database schema and restricts the data that can be stored in an instance of the database, such as domain constraints. ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning. Avoids data entry errors, too! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Primary Key Constraints A set of fields is a candidate key for a relation if : 1. No two distinct tuples can have same values in all key fields, and 2. No subset of the set of fields in a key is a unique identifier for a tuple. Part 2 false? A superkey (a set of fields containing a key). If there s >1 key for a relation, one of the keys is chosen (by DBA) to be the primary key. E.g., sid is a key for Students. (What about name?) The set {sid, gpa} is a superkey. Every relation is guaranteed to have a key. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Primary and Candidate Keys in SQL UNIQUE: Candidate keys (you could have many) PRIMARY KEY: one of candidate keys is chosen as the primary key. CREATE TABLE Students (sid CHAR(20), name CHAR(30), login CHAR(20), age INTEGER, gpa REAL, UNIQUE (name, age), PRIMARY KEY (sid)); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Name a constraint We could name a constraint in a table. If the constraint is violated, the constraint name is returned and can be used to identify the error. CONSTRAINT constraint-name CREATE TABLE Students (sid CHAR(20), name CHAR(30), login CHAR(20), dob DATE, gpa REAL, UNIQUE (name, dob), CONSTRAINT StudentsKey PRIMARY KEY (sid)); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Primary and Candidate Keys in SQL For a given student and course, there is a single grade. vs. Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade. Used carelessly, an IC can prevent the storage of database instances that arise in practice! CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid, cid) ); CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade) ); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation that is used to `refer to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer. E.g. sid is a foreign key referring to Students: Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19 Foreign Keys in SQL Only students listed in the Students relation should be allowed to enroll for courses. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES Students(sid) ); Enrolled sid cid grade 53666 Carnatic101 C 53666 Reggae203 B 53650 Topology112 A 53666 History105 B Students sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
Enforcing Integrity Constraints ICs are specified when a relation is created and enforced when a relation is modified. Violating the primary key constraint, the transaction will be rejected. Inserting a tuple with an existing sid. INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, Mike, mike@ee, 17, 3.4); Inserting a tuple with primary key as null. INSERT INTO Students (sid, name, login, age, gpa) VALUES (null, Mike, mike@ee, 17, 3.4); Updating a tuple with an existing sid. UPDATE Students S SET S.sid = 50000 WHERE S.sid=53688; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21 Enforcing Integrity Constraints Deletion of Enrolled tuples do not violate referential integrity, but insertions could. Inserting a tuple with an un-exist sid in Students. INSERT INTO Enrolled (sid, cid, grade) VALUES (51111, Hindi101, B ); Insertion of Students tuples do not violate referential integrity, but deletions could. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
Ways to handle foreign key violations If an Enrolled row with un-existing sid is inserted, it is rejected. If a Students row is deleted/updated, Option 1: Delete/Update all Enrolled rows that refer to the deleted sid in Students (CASCADE). Both are affected Option 2: Reject the deletion/updating of the Students row if an Enrolled row refers to it (NO ACTION ). [The default action for SQL]. None is affected. Option 3: Set the sid of Enrolled to some existing (default) sid value in Students for every involved Enrolled row (SET NULL / SET DEFAULT ). Both are affected. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23 Referential Integrity in SQL When a Students row is deleted, all Enrolled rows that refer to it are to be deleted as well. When a Students sid is modified, the update is to be rejected if an Enrolled row refers to the modified Students row. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students (sid) ON DELETE CASCADE ON UPDATE NO ACTION ); Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
The SQL Query Language A relational database query is a question about the data, and the answer consists of a new relation containing the result. To find information about all 17 years old students : SELECT * FROM Students S WHERE year(s.dob)=2000; To find just names and logins: sid name login Dob gpa 53666 Jones jones@cs 2000-09-01 3.4 53650 Smith smith@math 2000-10-10 3.8 SELECT S.name, S.login FROM Students S WHERE year(s.dob)=2000; name Jones Smith login jones@cs smith@ee Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25 Destroying and Altering Relations Destroys the relation Students. The schema information and the tuples are deleted. DROP TABLE Students; The schema of Students is altered by adding a new field; every tuple in the current instance is extended with a null value in the new field. ALTER TABLE Students ADD COLUMN firstyear: integer; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
Views A view is just a relation, but we store a definition, rather than a set of tuples. CREATE VIEW YoungActiveStudents (name, grade) AS SELECT S.name, E.grade FROM Students S, Enrolled E WHERE S.sid = E.sid and S.age<21; Views can be dropped using the DROP VIEW command. How to handle DROP TABLE if there s a view on the table? View becomes invalid if its base table is dropped. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27 Views and Security Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s). Given YoungStudents, but not Students or Enrolled, we can find students s who have are enrolled, but not the cid s of the courses they are enrolled in. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28