Database Management System (15ECSC208) UNIT I: Chapter 2: Relational Data Model and Relational Algebra

Relational Data Model and Relational Constraints Part 1

A simplified diagram to illustrate the main phases of database design process.

Background The relational model was first proposed by Dr. E. F. Codd of IBM Research in 1970 in the following paper: "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970. A Relation is a mathematical concept based on the ideas of sets. Commercial implementations of the relational DBMSs: DB2 and Informix Dynamic Server (from IBM), Oracle and Rdb (from Oracle), Sybase DBMS (from Sybase), SQL Server and Access (from Microsoft) Popular open source systems, such as MySQL and PostgreSQL

Relational Model Concepts Relational schema - Consists of relation name, and a set of attributes or field names or column names. Each attribute has an associated domain. Eg: STUDENT ( studentname: string, rollnumber: string, phonenumber: integer, ) Domain set of atomic values + data type.

Relational Model Concepts Relation instance - A finite set of tuples. A relation r of relation schema R = (A 1, A 2,, A m ) is set of n-tuples of the form [t 1,t 2,...,t n ] where each t i dom(a i ). A tuple of relation is an ordered sequence of values(v 1,v 2,...,v m ) such that v i dom(a i ), 1 i m.

Characteristics of Relation Ordering of Tuples in a Relation a relation is defined as a set of tuples tuples in a relation don t have any particular order Ordering of Attributes Ordering of values within a tuple Alternative definition: order of attributes and their values is not that important as long as the correspondence between attributes and values is maintained Values of Tuples each value in a tuple is an atomic value NULL values in Tuples values of attributes that may be unknown or may not apply to a tuple

Relational Model Constraint Types Constraints restrictions on data which is specified on a relational database schema. 1. Inherent model-based constraints or implicit constraints: Constraints that are inherent in the data model. Eg: duplicate tuples are not allowed in a relation. 2. Schema-based constraints or explicit constraints: Constraints that can be directly expressed in schemas of the data model, typically by specifying them in the DDL. 3. Application-based or semantic constraints or business rules: Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed and enforced by the application programs. Eg: this year s salary increase can be no more than last year s

Relational Model Constraints 1. Domain Constraints: Specify that value of each attribute A must be atomic and a member of dom(a). Data types associated with domains include - standard numeric data types for integers (such as short integer, integer, and long integer), - real numbers (float and double precision float), - characters, booleans, fixed-length and variable-length strings, - date, time, timestamp, or other special data types

Relational Constraints 2. Key constraints and constraints on NULL: Given a relational schema R = (A i ), i = 1 n, a subset of attributes SK is called a superkey if the values of attributes in SK is distinct for every relation instance. This is, for any two tuples t i, t j, (t i t j ) (t i [SK] t j [SK]). A key k of a relation is a subset of the superkey, k subset SK, such that removal of any attribute in k removes the superkey property for k. Also called as Minimal key.

Relational Constraints 2. Key constraints and constraints on NULL [cont ] Key satisfies two properties: Condition 1: Two distinct tuples in any state of relation cannot have identical values for (all) attributes in key Condition 2: It is a Minimal superkey - that is, A superkey from which we cannot remove any attributes and still have uniqueness constraint in condition 1 hold. Condition 1 also applies to a superkey. However, Condition 2 is not required by a superkey.

Relational Constraints 2. Key constraints and constraints on NULL [cont ] Candidate key Relation schema may have more than one key Primary key of the relation Designated among candidate keys Underline attribute Other candidate keys are designated as unique keys Also, none of the key attributes can have NULL value.

Relational Databases and Relational Database Schemas Relational database schema Set of relation schemas S = {R 1, R 2,..., R m } Set of integrity constraints IC Relational database state Set of relation states DB = {r 1, r 2,..., r m } Each r i is a state of R i and such that the r i relation states satisfy integrity constraints specified in IC

Schema diagram for University relational database schema

One possible relational database state corresponding to University schema

Relational Constraints 3. Integrity Constraints (IC): 3.1. Entity IC: No primary key value can be NULL 3.2. Referential IC: Specified between two relations Maintains consistency among tuples in two relations There can be more than one foreign key in a relation scheme Directed arc from each foreign key to the relation it references (Diagrammatically)

Relational Constraints Foreign key rules : A set of attributes FK in relation schema R 1 is a foreign key of R 1 that references relation R 2 if it satisfies the following rules: The attributes in FK have the same domain(s) as the primary key attributes PK of R 2 A value of FK in a tuple t 1 of the current state r 1 (R 1 ) either occurs as a value of PK for some tuple t 2 in the current state r 2 (R 2 ) or is NULL. All integrity constraints need to be specified on relational database schema.

Other Constraints Semantic ICs May have to be specified and enforced on a relational database Use triggers and assertions More common to check for these types of constraints within the application programs Functional dependency constraint Establishes a functional relationship among two sets of attributes X and Y X Y

Relational Database Design using ER to Relational Mapping

Example 6 Specifications The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations. A department controls a number of projects, each of which has a unique name, a unique number, and a single location. We store each employee's name, social security number. address, salary, sex, and birth date. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. We keep track of the number of hours per week that an employee works on each project. We also keep track of the direct supervisor of each employee. We want to keep track of the dependents of each employee for insurance purposes. We keep each dependent's first name, sex, birth date, and relationship to the employee.

ER schema diagram for COMPANY database

Database state for the COMPANY relational database schema

ER to Relational Mapping Algorithm Step1: Mapping of Regular Entity Types For each regular (strong) entity type E in the ER schema, create a relation R that includes all the simple attributes of E. Include only the simple component attributes of a composite attribute. Choose one of the key attributes of E as primary key for R. If the chosen key of E is composite, the set of simple attributes that form it will together form the primary key of R.

ER to Relational Mapping Algorithm Step2: Mapping of Weak Entity Types For each weak entity type W in the ER schema with owner entity type E, create a relation R and include all simple attributes (or simple components of composite attributes) of W as attributes of R. In addition, include as foreign key attributes of R the primary key attribute(s) of the relation(s) that correspond to the owner entity type(s); this takes care of the identifying relationship type of W. The primary key of R is the combination of the primary key(s) of the owner(s) and the partial key of the weak entity type W, if any.

ER to Relational Mapping Algorithm Step3: Mapping of Binary 1:1 Relationship Types For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that correspond to the entity types participating in R. Choose one of the relations say S, and include as a foreign key in S the primary key of T. It is better to choose an entity type with total participation in R in the role of S. Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship type R as attributes of S.

ER to Relational Mapping Algorithm Step4: Mapping of Binary 1 : N Relationship Types For each regular binary l:n relationship type R, identify the relation S that represents the participating entity type at the N- side of the relationship type. Include as foreign key in S the primary key of the relation T that represents the other entity type participating in R; (this is done because each entity instance on the N-side is related to at most one entity instance on the 1-side of the relationship type.) Include any simple attributes (or simple components of composite attributes) of the 1:N relationship type as attributes of S.

ER to Relational Mapping Algorithm Step5: Mapping of Binary M:N Relationship Types For each binary M:N relationship type R, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types; their combination will form the primary key of S. Also include any simple attributes of the M:N relationship type (or simple components of composite attributes) as attributes of S.

ER to Relational Mapping Algorithm Step6: Mapping of Multivalued Attributes For each multivalued attribute A, create a new relation R. This relation R will include an attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the relation that represents the entity type or relationship type that has A as an attribute. The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components.

ER to Relational Mapping Algorithm Step7: Mapping of N-ary Relationship Types. For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the n-ary relationship type (or simple components of composite attributes) as attributes of S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types.

Correspondence between ER and Relational Models In a relational schema relationship types are not represented explicitly Represented by having two attributes A and B: one a primary key and the other a foreign key

Result of Mapping the COMPANY ER Schema into a Relational database schema

Example ER to Relational Schema Mapping How do we represent this as a relational schema? date firstname lastname name category price Product Purchased Person name address Store

Specialization and Generalization Specialization is the process of defining a set of subclasses of an entity type; this entity type is called the superclass of the specialization. For example, the set of subclasses {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of the superclass EMPLOYEE. We can think of a reverse process of abstraction in which we suppress the differences among several entity types, identify their common features, and generalize them into a single superclass of which the original entity types are special subclasses. For example, CAR and TRUCK are subclasses of the generalized superclass VEHICLE. Refer: Chapter 8 (Sections 8.1 and 8.2) from Fundamentals of Database Systems (FODS), 6 th Edition.

Mapping EER Model Constructs to Relations Step8: Options for Mapping Specialization or Generalization. Convert each specialization with m subclasses {S1, S2,.,Sm} and generalized superclass C, where the attributes of C are {k,a1, an} and k is the (primary) key, into relational schemas using one of the four following options: Option 8A: Multiple relations-superclass and subclasses Option 8B: Multiple relations-subclass relations only Option 8C: Single relation with one type attribute Option 8D: Single relation with multiple type attributes

Mapping EER Model Constructs to Relations Option 8A: Multiple relations-superclass and subclasses Create a relation L for C with attributes Attrs(L) = {k,a1, an} and PK(L) = k. Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attrs(Li) = {k} U {attributes of Si} and PK(Li)=k. Note: This option works for any specialization (total or partial, disjoint of over-lapping)

Option 8A: Multiple relations-superclass and subclasses

Mapping EER Model Constructs to Relations Option 8B: Multiple relations-subclass relations only Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) = {attributes of Si} U {k,a1,an} and PK(Li) = k. Note: This option only works for a specialization whose subclasses are total (every entity in the superclass must belong to (at least) one of the subclasses).

Option 8B: Multiple relations-subclass relations only

Update Operations and Dealing with Constraint Violations Operations of the relational model can be categorized into retrievals and updates Basic operations that change the states of relations in the database: Insert Delete Update (or Modify) Basic operations when applied, the ICs specified on the relational database schema should not be violated.

Update Operations and Dealing with Constraint Violations Focus: Types of constraints that may be violated by update operations (Insert, Delete & Update) and the types of actions that may be taken if an operation causes a violation.

Database state for the COMPANY relational database schema

Referential ICs displayed on the COMPANY relational database schema

The Insert Operation Provides a list of attribute values for a new tuple t that is to be inserted into a relation r. Can violate any of the four types of constraints If an insertion violates one or more constraints Default option is to reject the insertion

Insert Violation 4 Constraints example Domain constraints can be violated if an attribute value is given that does not appear in the corresponding domain or is not of the appropriate data type. Key constraints can be violated if a key value in the new tuple t already exists in another tuple in the relation r(r). Entity integrity can be violated if the primary key of the new tuple t is null. Referential integrity can be violated if the value of any foreign key in t refers to a tuple that does not exist in the referenced relation.

Insert operation Some examples [1] Operation: Insert < Cecilia, F, Kolonsky, NULL, 1960-04-05, 6357 Windy Lane, Katy, TX, F, 28000, NULL, 4> into EMPLOYEE. Result: This insertion violates the entity integrity constraint (NULL for the primary key SSN), so it is rejected. [2] Operation: Insert < Alicia, J, Zelaya, 999887777, 1960-04-05, 6357 Windy Lane, Katy, TX, F, 28000, 987654321, 4> into EMPLOYEE. Result: This insertion violates the key constraint because another tuple with the same Ssn value already exists in the EMPLOYEE relation, and so it is rejected. [3] Operation: Insert < Cecilia, F, Kolonsky, 677678989, 1960-04-05, 6357 Windswept, Katy, TX, F, 28000, 987654321, 7> into EMPLOYEE. Result: This insertion violates the referential integrity constraint specified on Dno in EMPLOYEE because no corresponding referenced tuple exists in DEPARTMENT with Dnumber = 7. [4] Operation: Insert < Cecilia, F, Kolonsky, 677678989, 1960-04-05, 6357 Windy Lane, Katy, TX, F, 28000, NULL, 4> into EMPLOYEE. Result: This insertion satisfies all constraints, so it is acceptable.

The Delete Operation Can violate only referential integrity constraint If tuple being deleted is referenced by foreign keys from other tuples Options if a delete operation causes a violation: Restrict - Reject the deletion Cascade - Propagate the deletion by deleting tuples that reference the tuple that is being deleted Set null or set default - Modify the referencing attribute values that cause the violation

Delete operation Some examples [1] Operation: Delete the WORKS_ON tuple with Essn = 999887777 and Pno = 10. Result: This deletion is acceptable and deletes exactly one tuple. [2] Operation: Delete the EMPLOYEE tuple with Ssn = 999887777. Result: This deletion is not acceptable, because there are tuples in WORKS_ON that refer to this tuple. Hence, if the tuple in EMPLOYEE is deleted, referential integrity violations will result. [3] Operation: Delete the EMPLOYEE tuple with Ssn = 333445555. Result: This deletion will result in even worse referential integrity violations, because the tuple involved is referenced by tuples from the EMPLOYEE, DEPARTMENT,WORKS_ON, and DEPENDENT relations.

The Update Operation Necessary to specify a condition on attributes of relation Select the tuple (or tuples) to be modified If attribute not part of a primary key nor of a foreign key Usually causes no problems Updating a primary/foreign key Similar issues as with Insert/Delete

Update operation Some examples [1] Operation: Update the salary of the EMPLOYEE tuple with Ssn = 999887777 to 28000. Result: Acceptable. [2] Operation: Update the Dno of the EMPLOYEE tuple with Ssn = 999887777 to 1. Result: Acceptable. [3] Operation: Update the Dno of the EMPLOYEE tuple with Ssn = 999887777 to 7. Result: Unacceptable, because it violates referential integrity. [4] Operation: Update the Ssn of the EMPLOYEE tuple with Ssn = 999887777 to 987654321. Result: Unacceptable, because it violates primary key constraint by repeating a value that already exists as a primary key in another tuple; it violates referential integrity constraints because there are other relations that refer to the existing value of Ssn.

Exercise: Suppose that each of the following Update operations is applied directly to the database state shown in Figure given as handout. Discuss all integrity constraints violated by each operation, if any, and the different ways of enforcing these constraints. a. Insert < Robert, F, Scott, 943775543, 1972-06-21, 2365 Newcastle Rd, Bellaire, TX, M, 58000, 888665555, 1> into EMPLOYEE. b. Insert < ProductA, 4, Bellaire, 2> into PROJECT. c. Insert < Production, 4, 943775543, 2007-10-01 > into DEPARTMENT. d. Insert < 677678989, NULL, 40.0 > into WORKS_ON. e. Insert < 453453453, John, M, 1990-12-12, spouse > into DEPENDENT. f. Delete the WORKS_ON tuples with Essn = 333445555. g. Delete the EMPLOYEE tuple with Ssn = 987654321. h. Delete the PROJECT tuple with Pname = ProductX. i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with Dnumber = 5 to 123456789 and 2007-10-01, respectively. j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn = 999887777 to 943775543. k. Modify the Hours attribute of the WORKS_ON tuple with Essn = 999887777 and Pno = 10 to 5.0.

Resources Chapter 3 of Fundamentals of Database Systems (FODS), 6 th Edition. Chapter 8 (Sections 8.1 and 8.2) from Fundamentals of Database Systems (FODS), 6 th Edition. Chapter 9 (Sec. 9.1) for ER-to-Relational Mapping of Fundamentals of Database Systems (FODS), 6 th Edition.