The Relational Data Model and Relational Database Constraints

The Relational Data Model and Relational Database Constraints First introduced by Ted Codd from IBM Research in 1970, seminal paper, which introduced the Relational Model of Data representation. It is based on Predicate Logic and set theory. Predicate logic is used extensively in mathematics and provides a framework in which an assertion can be verified as either true or false. For example, a person with T number T00178477 is named Lila Ghemri. This assertion can easily be demonstrated as true or false. Set theory deals with sets and is used as a basis for data manipulation in the relational model. For example, assume that A= {16, 24, 77}, B= {44, 77, 90, 11}. Given this information, we can compute that A B= {77} Based on these concepts, the relational model has three well defined components: 1. A logical data structure represented by relations 2. A set of integrity rules to enforce that the data are and remain consistent over time 3. A set of operations that defines how data are manipulated.

Relational Model Concepts: The relational model represents the database as a collection of relations. Informally, each relation resembles a table of values or a flat file of records. When a relation is thought of as a table of values, each row in the table represents a collection of related data values corresponding to a real world entity. The table name and column names are used to help interpret the meaning of the values in each row. For example the table below is called STUDENT because each row represents facts about a particular student entity. STUDENT Name Ssn Home_phone Address Office_ Age Gpa phone Benjamin Bayer 304-61- (817)373-1616 2918 Bluebonet NULL 19 3.21 Chungcha Kim Dick Davidson Rohan Panchal Barbara Benson 533-69- 128 Lane (817)375-4408 124 Kirby Road NULL 3425 Elgin Road (817) 378-9034 238 Lark Avenue (817)856-0101 8568 Lark Lane NULL 18 2.89 (817) 748-6592 2435 381-62- 1245 489-11- 2320 489-22- 1100 (817)748-6492 25 3.53 28 3.93 NULL 19 3.25 The column names Name, Student_number, Class and Major specify how to interpret the data values in each row based on the column in which they are. All values in a column are of the same data type. Terminology:

In a formal relational model, a row is called a tuple, a column header is called an attribute, and the table is called a relation. The data type describing the type of values that can appear in each column is represented by a domain of possible values. Domains, Attributes, Tuples and Relations: A domain D is a set of atomic values. An atomic value is a value that is indivisible as far as the formal definition is concerned. Domains: A common method of specifying a domain is to define the datatype from which the data values are drawn. It is useful to specify a name for the domain: -Usa_phone_number: the set of 10 digits numbers valid in the US. -Local_phone_number: the set of 7 digits phone numbers valid with a particular area code. ddd-dddd -Social-Security-numbers: the set of valid 9 digits SSN : ddd-dd-dddd -Names: The set of characters or a string -Grade_point_average: a float between 0 and 4 -Employee_age: an integer between 16-70 These names are called logical definitions of domains. A data type or format is also specified for each domain. For example, the data type for the domain USA-Phone_numbers can be declared as a character string of the form (ddd)ddd-dddd, where each d is a digit, and the first three digits form a valid area code. A domain is thus defined by a name, a data type and a format. A relation schema or relation scheme R, denoted R(A1, A2,... An), is made up of a relation name R and a list of attributes A1, A2,... An

Each attribute Ai is the name of a feature describing the concept or relation. D is the domain in which each attribute takes its values and is denoted by dom(ai). A relation schema is used to describe a relation: R is called the name of this relation. The degree (or arity) of the relation is the number of attributes n of its relation schema. A relation of degree seven, which stores information about university students would contain seven attributes describing each student, as follows: STUDENT(Name, Ssn, Home_phone, Address, Office_Phone, Age, Gpa) Adding the data type of each attribute, the definition is sometimes written as: STUDENT(Name:string, Ssn:string, Home_phone:string, Address:string, Office_Phone:string, Age:integer, Gpa:real) In this relation schema, STUDENT is the name of the relation, which has 7 attributes. We have assigned generic data types to these attributes. However, we can also use the previously defined domains for some of these attributes. For example: dom(name)= Names; dom(ssn)= Social_security_numbers; dom(homephone)= USA_phone_numbers, etc.. A relation (or relation state) r of the relation schema R(A1, A2,... An), also denoted by r(r), is a set of n-tuples r={t1, t2,.., tm}. Each n-tuple is an ordered list of n-values t=< v1, v2,.., vn>, where each value vi 1 i n is an element of dom(ai) or a special value NULL. The i th value in tuple t, which corresponds to the attribute Ai, is referred to as t[ai] or t. Ai( or t[i]). The terms relation intension for schema R or relation extension for a relation state r(r) are also used. STUDENT

Name Ssn Home_phone Address Office_phone Age Gpa Benjamin 304- (817)373-1616 2918 Bluebonet NULL 19 3.21 Bayer 61- Lane Chungcha Kim Dick Davidson Rohan Panchal Barbara Benson 2435 381-62- 1245 489-11- 2320 489-22- 1100 533-69- 128 (817)375-4408 124 Kirby Road NULL 18 2.89 NULL 3425 Elgin Road (817) 748-6592 25 3.53 (817) 378-9034 238 Lark Avenue (817)748-6492 28 3.93 (817)856-0101 8568 Lark Lane NULL 19 3.25 STUDENT is the relation name Name, Ssn, Home_phone, Address, Office_phone, Age and Gpa are attributes All data entries are tuples. It is possible for several attributes to have the same domain, the attribute name however will indicate their role or purpose in the relation. Characteristics of a Relation: 1- Ordering of tuples in a relation: A relation is defined as a set of tuples. Mathematically, the elements of a set have no order, hence tuples in a particular relation have no particular order. Tuple ordering is of no consequence in a relation. 2- Ordering of Values in a tuple and an Alternative definition of a relation: An n-tuple is an ordered list of n values, so the ordering of values in a tuple is important. 3- The order of values of a tuple can be changed as long as the correspondence between attributes and values is maintained. An alternative definition of a relation can be given as

R={ A1, A2,... An} a set of attributes (instead of a list) and a relation state r(r) is a finite set of mapping r={t1, t2,.., tm}, where each tuple ti is a mapping from R to D and D is the union of the attribute domains D= dom(a1) dom( A2)...dom( An) According to the definition of tuple as a mapping, a tuple can be considered as a set of (<attribute> <value>) pairs, where each pair gives the value of the mapping from attribute Ai to a value vi from dom(ai) Example t1=(benjamin Bayer, 304-61-2435, (817)373-1616, 2918 Bluebonnet Lane, 19, 3.21) t1= {(Name, Benjamin Bayer), (Ssn, 304-61-2435), (Home_phone, (817)373-1616), (Address, 2918 Bluebonet Lane), (Office_phone, NULL), (Age, 19), (Gpa, 3.21)} Find an identical tuple? t1= {(Home_phone, (817)373-1616), (Name, Benjamin Bayer), (Age, 19), (Ssn, 304-61-2435), (Address, 2918 Bluebonet Lane), (Office_phone, NULL), (Gpa, 3.21)} Interpretation of a Relation: The relation schema can be interpreted as a declaration or an assertion. Some relations may represent facts about entities, whereas other relations may represent facts about relationships. For example a relation STUDENT, represents facts about a student entity. However a relation MAJORS(Student_Ssn, Department_code) asserts that a student majors in an academic discipline. A tuple in this relation relates a student to the department in which the major is offered. Relational Model Notation: A relation schema R of degree n is denoted by R(A1, A2,... An), for example:

STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, GPA) Uppercase letters Q, R, S denote relation names (names of tables) The letters t, u, v denote tuples (records in a table) In general, the name of a relation such as STUDENT also includes the current set of tuples in that relation: All attribute names in a particular relation must be distinct. An n-tuple t in a relation r(r) is denoted t=< v1, v2,.., vn> where each value vi 1 i n is the value corresponding to attribute Ai. This notation is called component values of tuples. Schemas, Instances and Database State: It is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. A diagram schema: is a way to display the database schema. Consider the following example of a database that stores student records and their grades. STUDENT Name Student Number Class Major Smith 17 1 CS Brown 8 2 CS COURSE CourseName CourseNumber CreditHours Department Intro to Computer CS1310 4 CS Science Data structures CS3320 4 CS

Discrete MATH2410 3 MATH Mathematics Database CS3380 3 CS SECTION SectionIdentifier CourseNumber Semester Year Instructor 85 Math2410 Fall 98 King 92 CS1310 Fall 98 Anderson 102 CS3320 Spring 99 Knuth 112 MATH2410 Fall 99 Chang 119 CS1310 Fall 99 Anderson 135 CS3380 Fall 99 Stone GRADE- REPORT StudentNumber SectionIdentifier Grade 17 112 B 17 119 C 8 85 A 8 92 A 8 102 B 8 136 A PREREQUISITE CourseNumber PrerequisiteNumber CS3380 CS3320 CS3380 MATH2410 CS3320 CS1310 Corresponding schema diagram: STUDENT Name Student Number Class Major COURSE

CourseName CourseNumber CreditHours Department PREREQUISITE CourseNumber PrerequisiteNumber SECTION SectionIdentifier CourseNumber Semester Year Instructor GRADE_REPORT StudentNumber SectionIdentifier Grade A schema diagram displays only some aspects of a schema, such as the names of record types and data items and some type of constraints. Other aspects, such as the data type of each data item or constraints such as a student majoring in computer science must take CS1310 before the end of their sophomore year are not represented. The actual data in the database may change quite frequently, each time we add a student or enter a grade. Database State: The data in the database at a particular moment is called database state or snapshot. It is also called the current set of occurrences or instances in the database. When we define a new database, we specify its database schema only to the DBMS. At this point the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. A valid state of the database is a state that satisfies the structure and constraints specified in the schema.

The DBMS stores the description of the schema constructs or constraints- metadata- in the DBMS catalog. Relational Model Constraints In a relational database, there will be many relations and the tuples in these relations are usually related in various ways. Therefore, there must be many restrictions on the actual values in a database, so as to keep the database consistent. These constraints are derived from the application that the database represents. Constraints on databases can generally be divided into three main categories: 1. Constraints that are inherent in the data model, called inherentmodel-based constraints or implicit constraints. For example, that not 2 tuples should be identical are such constraints. 2. Constraints that are directly expressed in schema of the data model, by specifying them in the DLL (data definition language); these are called schema-based constraints or explicit constraints. 3. Constraints that cannot be directly expressed in the schemas of the data model and must be enforced by the application program, these are called application-based or semantic constraints or business rules. 4. All integrity constrains should be specified on the relational database schemas, as part of the database definition. 5. Most relational databases support key, entity integrity and referential integrity constraints. Domain Constraints: Domain constraints specify the data types of each attribute in the table. The data types associated with domains include standard numeric datatypes for integers, real numbers, characters, strings, Booleans. Also are available dates, time, money, a subrange of values from a data type, or an enumerated data type in which all valid values are explicitly listed. Example: CREATE TABLE VENDOR (

V_CODE INTEGER PRIMARY KEY, V_NAME VARCHAR(35) NOT NULL, V_CONTACT VARCHAR(15) NOT NULL, V_AREACODE CHAR(3) NOT NULL, V_PHONE CHAR(8) NOT NULL, V_STATE CHAR(2) NOT NULL, V_ORDER CHAR(1) NOT NULL ); Key Constraints In the relational model, a relation is defined as a set of tuples. All elements in a set are, by definition, distinct; hence all tuples in a relation must be distinct. This means that no 2 tuples can have the same combination of values for all their attributes. Usually, there exists a subset of attributes of a relation schema R with the property that no two tuples in any relation state r of R should have the same combination of values for these attributes. Suppose that we denote SK such a subset; then given any two distinct tuples ti and tj in a relation state r in R we have the constraint that: ti[sk] tj[sk] Any such set of attributes SK is called a superkey of the relation schema R. A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r of R can have the same value for SK. Every relation has at least one default superkey- the set of all its attributes. A superkey can have redundant attributes, so a more useful concept is that of a key, which has no redundancy.

A key K of a relation schema R is a superkey of R with the additional property that removing any attributes A from K leaves a set of attributes K that is not a superkey of R. A key has to satisfy these properties: 1. A key cannot be NULL 2. Two distinct tuples in any state of the relation cannot have identical values for (all) the attributes of the key. 3. A key is a minimal superkey- that is a superkey from which we cannot remove any attribute and still have the uniqueness constraint in 2 hold. 4. The key has the property to be time-invariant: the property of being a key must continue to hold when we insert new tuples in the relation. Name Ssn Homep Address Office Age Gpa hone Phone Benjamin 304-61-2435 (817)37 2918 Bluebonet NULL 19 3.21 Bayer 3-1616 Lane Chung-cha 381-62-1245 (817)37 124 Kirby Road NULL 18 2.89 Kim 5-4408 Dick 489-11-2320 NULL 3425 Elgin (817) 748-25 3.53 Davidson Road 6592 Rohan Panchal 489-22-1100 (817) 378-238 Lark Lane (817)748-6492 28 3.93 Barbara Benson 9034 533-69-128 (817)85 6-0101 238 Lark Lane NULL 19 3.25 STUDENT relation: {Ssn} is a key of STUDENT. Any set of attributes that includes Ssn {Ssn, Name, Age} is a superkey. {Address, Name, Age} is not a key- Why? A composite key is a key with multiple attributes. A composite key must have all its attributes together to have the uniqueness property. Any superkey formed from a single attribute is also a key. The value of a key attribute is used to identify uniquely each tuple in the relation.

A set of attributes constituting a key is a property of the relation, it is a constraint that should hold on every valid relation state of the schema. In general a relation schema may have more than one key, in this case each key is called a candidate key. We usually designate one of the candidate keys as the primary key. This is the key used to identify tuples in a relation. By convention, the attributes that form the primary key of a relation schema are underlined in the schema of the relation. STUDENT { Ssn, Name, Home_Address, Home_Phone, Office_Phone, Age, Gpa} -When a relation schema has several candidate keys, it is better to choose as the primary key the one that has a single attribute, or a small number of attributes. The other candidate keys are designated as unique keys and are not underlined. CREATE TABLE VENDOR ( V_CODE INTEGER PRIMARY KEY, V_NAME VARCHAR(35) NOT NULL, V_CONTACT VARCHAR(15) NOT NULL, V_AREACODE CHAR(3) NOT NULL, V_PHONE CHAR(8) NOT NULL, V_STATE CHAR(2) NOT NULL, V_ORDER CHAR(1) NOT NULL ); Constraints on NULL Values:

Another constraint on attributes specifies whether NULL values are or are not permitted. For example, if every student tuple must have a valid, non- NULL value for the Name attribute, then the Name of STUDENT is constrained to be NOT NULL. CREATE TABLE VENDOR ( V_CODE INTEGER PRIMARY KEY, V_NAME VARCHAR(35) NOT NULL, V_CONTACT VARCHAR(15) NOT NULL, V_AREACODE CHAR(3) NOT NULL, V_PHONE CHAR(8) NOT NULL, V_STATE CHAR(2) NOT NULL, V_ORDER CHAR(1) NOT NULL ); Give the SQL definition of this table: STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, GPA) Name Ssn Homep Address Office Age Gpa hone Phone Benjamin 304-61-2435 (817)37 2918 Bluebonet NULL 19 3.21 Bayer 3-1616 Lane Chung-cha 381-62-1245 (817)37 124 Kirby Road NULL 18 2.89 Kim 5-4408 Dick 489-11-2320 NULL 3425 Elgin (817) 748-25 3.53 Davidson Road 6592 Rohan Panchal 489-22-1100 (817) 378-238 Lark Lane (817)748-6492 28 3.93 Barbara Benson 9034 533-69-128 (817)85 6-0101 238 Lark Lane NULL 19 3.25

Relational Databases and Relational Database Schemas: The definition and constraints we have seen so far apply to single relations and their attributes. A relational database usually contains many relations, with tuples in relations related across relations. A relational database schema S is a set of relation schemas S= { R1, R2, Rm} and a set of integrity constraints IC. When we refer to relational database, we implicitly include both its schema and its current state. A database state that does not obey all the integrity constraints is called an invalid state. A state that satisfies all the constraints in the defined set of integrity constraints IC is called a valid state. Often different relations refer to the same concept. Attributes that represent the same concept may or may not have identical names in different relations. EMPLOYEE (FName,Minit, LName, Ssn, Bdate, Address, Sex, Salary, Super_ssn, Dno) DEPARTEMENT (Dname, Dnumber, Mgr_ssn, Mgr_start_date) For example, the number given to a department is called Dno in EMPLOYEE and Dnumber in DEPARTMENT. Key, Integrity, Referential Integrity and Foreign Keys: Key constraint states that every primary key value must be unique Entity integrity constraint states that no primary key value can be NULL. This is because the primary key is used to identify tuples. Key and Entity integrity constraints are applied to each relation.

Referential integrity constraints are specified between two relations and are used to maintain consistency among tuples in the two relations. Referential integrity constraints state that a tuple in one relation that refers to another relation, must refer to an existing tuple in that relation. To define referential integrity more formally, we first define the concept of foreign key. Foreign Keys: A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if it satisfies the following rules: 1. The attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes FK are said to refer to the relation R2. 2. A value of FK in a tuple t1 of R1 either occurs as a value of PK for some tuple t2 in R2 or is NULL. In the former case, we have t1[fk]= t2[pk], and we say that the tuple t1 references or refers to the tuple t2. In this definition, R1 is called the referencing relation and R2 the referenced relation. If these two conditions hold, a referential integrity constraint from R1 to R2 is said to hold. In a database with many relations, there are usually many such referential integrity constraints. To specify these constraints, first we must have a clear understanding of the meaning or role of each attribute or set of attributes plays in the various relation schemas of the database.

EMPLOYEE (FName, Minit, LName, Ssn, Bdate, Address, Sex, Salary, Super_ssn, Dno) DEPARTMENT (Dname, Dnumber, Mgr_Ssn, Mgr_start_date) PROJECT (Pname, Pnumber, Plocation, Dnum) DEPT_LOCATIONS (Dnumber, Dlocation) WORKS_ON (Essn, Pno, Hours) DEPENDENT (Essn, Dependent_name, Sex, Bdate, Relationship) For example, in the EMPLOYEE relation, the attribute Dno refers to the department in which the employee works; hence we designate Dno to be a foreign key of EMPLOYEE referencing the DEPARTMENT relation. This means that a value of Dno in any tuple ei of the EMPLOYEE relation must 1) match a value of the primary key of DEPARTMENT- the Dnumber attribute in some tuple dj of the DEPARTMENT relation, or 2) The value of the attribute Dno can be NULL if the employee ei does not belong to any department or will be assigned a department later. Diagram display of referential integrity constraint by drawing a directed edge from the foreign key to the relation it references. EMPLOYEE Fname Minit Lname Ssn Bdate Address Sex Salary SuperSsn Dno DEPARTMENT Dname Dnumber Mgr_ssn Mgr_start_date

A foreign key can refer to its own relation. For example, the attribute Super_Ssn in Employee refers to the supervisor of an employee. That is another employee, represented by a tuple in the relation EMPLOYEE. Hence Super_ssn is a foreign key that reference the EMPLOYEE relation itself. EMPLOYEE Fname Minit Lname Ssn Bdate Address Sex Salary SuperSsn Dno DEPARTMENT Dname Dnumber Mgr_ssn Mgr_start_date CREATE TABLE VENDOR ( V_CODE INTEGER PRIMARY KEY, V_NAME VARCHAR(35) NOT NULL, V_CONTACT VARCHAR(15) NOT NULL, V_AREACODE CHAR(3) NOT NULL, V_PHONE CHAR(8) NOT NULL, V_STATE CHAR(2) NOT NULL, V_ORDER CHAR(1) NOT NULL ); CREATE TABLE P ( P_CODE VARCHAR(10) PRIMARY KEY, P_DESCRIPT VARCHAR(35) NOT NULL, P_INDATE DATE NOT NULL, P_QOH INT NOT NULL, P_MIN INT NOT NULL, P_PRICE DECIMAL (8,2) NOT NULL, P_DISCOUNT DECIMAL (5,2) NOT NULL, V_CODE INTEGER, FOREIGN KEY(V_CODE) REFERENCES VENDOR(V_CODE));

Surrogate Keys: Surrogate keys are keys that are created. They are numeric values that are appended to the relation to serve as the primary key. Surrogate keys have no meaning to the user and are normally hidden from them on forms or queries and reports. They are widely used in order to avoid the overhead in memory of using composite keys. Most DBMS systems have a facility for automatically generating surrogate key values. Other types of Constraints: Semantic integrity constraints are constraints that are application specific and can be enforced within the application programs that update the database. An example of such constraints is that the maximum number of hours in that an employee can work in all projects is 56. Database Modification and Update: There are 3 operations that can change the state of relations in the database: Insert: to add new data into the relation by adding one or more tuples in a relation. Delete: is used to delete tuples Update (or Modify): used to change the value of some attributes in existing tuples. Whenever these operations are applied, the integrity constraints defined on the relational database should not be violated. The Insert Operation: The Insert operation provides a list of attribute values for a tuple t that is to be inserted into a relation R. An Insert operation can violate any of the specified integrity constraints described earlier:

Example: Insert < Cecilia, F, Kolonsky, NULL, 1960-04-05, 6578 Windy Lane, Katy, TX, F, 28000, NULL, 4> into EMPLOYEE Result: Rejected why? Insert < Alicia, J, Zelaya, 999887777, 1960-04-05, 6578 Windy Lane, Katy, TX, F, 28000, NULL, 4> into EMPLOYEE Result: Rejected Why? Insert < Cecilia, F, Kolonsky, 677898790, 1960-04-05, 6578 Windy Lane, Katy, TX, F, 28000, 987654321, 7> into EMPLOYEE Result: Reject why? Insert < Cecilia, F, Kolonsky, 778987901, 1960-04-05, 6578 Windy Lane, Katy, TX, F, 28000, NULL, 4> into EMPLOYEE. Result:OK If an insertion violates one or more constraints, the default option is to reject the insertion. The Delete Operation: The Delete operation can violate only referential integrity. This occurs if the tuple being deleted is referenced by foreign keys from other tuples in the database. To specify deletion a condition on the attribute of the relation selects the tuple or tuples to be deleted. Operation: Delete the WORKS-ON tuple with Essn= 999887777 and Pno=10. Result:?? works DELETE from WORKS-ON where Essn= 999887777 AND Pno= 10; Result: doesn t work Operation: Delete the EMPLOYEE tuple with Ssn= 999887777 Result:???

Operation: Delete the Employee tuple with Ssn= 333445555 Result:?? Several options are available if a deletion operation causes a violation. The first option, called restrict, is to reject the deletion. The second option, called cascade, is to attempt to propagate the deletion by deleting tuples that reference the tuple being deleted. A third option, called set null or set default is to modify the referencing attribute values that cause the violation; each such value is set to NULL or changed to another default valid tuple. Combinations of these three options are also possible, in general the DBMS will allow the database designer to specify which options applies in case of violation of the constraints. The Update Operation The Update or (Modify) operation is used to change the values of one or more attributes in a tuple or tuples for some relation R. It is necessary to specify a condition on the attributes of the relation to select the tuple or tuples to be modified. Operation: Update the Salary of the EMPLOYEE tuple with Ssn= 999887777 to 28000 Result:? Operation: Update the Dno of the EMPLOYEE tuple with Ssn= 999887777 to 1 Result:? Operation: Update the Dno of the EMPLOYEE tuple with Ssn= 999887777 to 7 Result:? Operation: Update the Ssn of the EMPLOYEE tuple with Ssn= 999887777 to 987654321 Result:?

-Updating an attribute that is neither part of a primary key or a foreign key usually causes no problems; the DBMS need to check and confirm that the new value is of the correct data type and domain. -Modifying a primary key is similar to deleting one tuple and inserting another in its place because we use the primary key to identify the tuples. -If a foreign key attribute is modified, the DBMS must make sure that the new value refers to an existing tuple in the referenced relation (or set to NULL).