CS 377 Database Systems Relational Data Model Li Xiong Department of Mathematics and Computer Science Emory University 1
Outline Relational Model Concepts Relational Model Constraints Relational Database and operations 2
Relational Model First formal database model Introduced by Codd in "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970. First commercial implementations available in early 1980s Based on the concept of a mathematical relation and has theoretical basis in set theory and firstorder predicate logic. Other models: hierarchical model, network model 3
INFORMAL DEFINITIONS RELATION: A table of values A relation may be thought of as a set of rows. A relation may alternately be thought of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Table name and column names help interpret the meaning of the values 4
FORMAL DEFINITIONS Relation Schema The table is called a relation, a row is a tuple, a column header is an attribute Relation Schema R (A 1, A 2,...A n ) Made up of a relation name R and a set of attributes A 1, A 2,...A n Degree (or arity) of a relation is the number of attributes n of its relational schema E.g. STUDENT (Name, SSN, HomePhone, Address, OfficePhone, Age, GPA) Each attribute A i has a domain dom(a i ) that defines the possible values of the attribute by a data-type or a format E.g. The domain of SSN is the set of 9 digit numbers defined as: ddd-dd-dddd where each d is a decimal digit. 5
FORMAL DEFINITIONS Relation A relation (or relation state) r of the relation schema R (A 1, A 2,...A n ), r(r), is a set of tuples r = {t 1, t 2,..., t m } A tuple t is an ordered set of n values t =<v 1, v 2,..., v n >, where each value v i, 1 i n, is an element of dom(a i ) or a special NULL value E.g. < Benjamin Bayer", 305-61-2435, 373-1616, 2918 Bluebonnet Lane, null, 19, 3.21> is a tuple belonging to the STUDENT relation. 6
Mathematical Definitions A relation r(r) is a mathematical relation of degree n on the domains dom(a 1 ), dom(a 2 ),..., dom(a n ), which is a subset of the Cartesian product of the domains that define R: r(r) (dom(a 1 ) dom(a 2 )... dom(a n )) The Cartesian product is the direct product of the sets of values of all domains: dom (A 1 ) dom (A 2 )... dom(a n ) The total number of tuples in the Cartesian product is: dom (A 1 ) dom (A 2 )... dom(a n ) Current relation state reflects only the valid tuples that represent a particular state 7
CHARACTERISTICS OF RELATIONS Ordering of tuples in a relation r(r) A relation is a set of tuples which are not ordered Ordering of attributes The attributes in R(A 1, A 2,..., A n ) and the values in t=<v 1, v 2,..., v n > are an ordered list in our definition Alternative definition: tuple considered as a set of (<attribute>, <value>) pairs, where each pair gives the value of the mapping from an attribute A i to a value v i from dom(a i ) Values in a tuple All values are considered atomic (flat relational model with first normal form assumption) what about multi-valued attributes and composite attributes? A special null value is used to represent values that are unknown or inapplicable to certain tuples. 8
DEFINITION SUMMARY Informal Terms Table Column Row Values in a column Table Definition Formal Terms Relation Attribute Tuple Domain Relation Schema 9
Relational Model Notation Relation schema R of degree n: R(A 1, A 2,..., A n ) Relation names: Q, R, S Relations: q, r, s Tuples: t, u, v tuple t in a relation r(r): t = <v 1, v 2,..., v n >, v i is the value corresponding to attribute A i Component values of tuples: t[a i ] and t.a i refer to the value v i in t for attribute A i t[a u, A w,..., A z ] and t.(a u, A w,..., A z ) refer to the subtuple of values <v u, v w,..., v z > from t corresponding to the attributes specified in the list 10
Outline Relational Model Concepts Relational Model Constraints Relational Database and operations 11
Relational Model Constraints Constraints Restrictions on the actual values in a database state Inherent model-based constraints or implicit constraints Inherent in the data model E.g. no duplicate tuples Schema-based constraints or explicit constraints Can be directly expressed in schemas of the data model Application-based or semantic constraints or business rules Cannot be directly expressed in schemas, expressed and enforced by application program E.g. the max. no. of hours per employee for all projects he or she works on is 56 hrs per week 12
Schema-based constraints Domain constraints Key constraints Entity integrity constraints Referential integrity constraints 13
Domain Constraints The value of each attribute A must be an atomic value from the domain dom(a) Typical data types associated with domains: Numeric data types for integers and real numbers Characters Booleans Fixed-length strings Variable-length strings Date, time, timestamp Money Other special data types 14
Key Constraints No two tuples can have the same combination of values for all their attributes. Superkey No two distinct tuples in any state r of R can have the same value for SK Key Superkey of R Removing any attribute A from K leaves a set of attributes K that is not a superkey of R any more 15
Key Constraints and Constraints on NULL Values (cont d.) Key satisfies two properties: Two distinct tuples in any state of relation cannot have identical values for (all) attributes in key Minimal superkey Cannot remove any attributes and still have uniqueness constraint in above condition hold 16
Key Constraints and Constraints on NULL Values (cont d.) Candidate key Relation schema may have more than one key Primary key of the relation Designated among candidate keys Underline attribute Other candidate keys are designated as unique keys 17
Key Constraints and Constraints on NULL Values (cont d.) 18
Key Constraints Superkey of R: A set of attributes SK of R such that no two tuples in any valid relation instance r(r) will have the same value for SK. For any distinct tuples t1 and t2 in r(r), t1[sk] t2[sk]. {Licence_number}, {License_number, Make}, {Engine_serial_number, Make} Key of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey. Key1 = {License_number}, Key2 = {Engine_serieal_number} Is {Engine_serial_number, Make} a key? If a relation has several keys, each is called a candidate key, and one is chosen arbitrarily to be the primary key. The primary key attributes are underlined. 19
Entity Integrity Entity Integrity: The primary key attributes PK of each relation schema R cannot have null values in any tuple of r(r). t[pk] null for any tuple t in r(r) Primary key values are used to identify the individual tuples. Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key. 20
Referential Integrity Referential integrity: a tuple in one relation that refers to another relation must refer to an existing tuple in that relation. Formally A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if: FK have the same domains as the primary key attributes PK of R2 The value of FK in the current state of R 1 can be either: (1) a value of PK in the current state of R 2 : t 1 [FK] = t 2 [PK]. (2) a null R1 is the referencing relation and R2 is the referenced relation. A tuple t 1 in R 1 is said to reference a tuple t 2 in R 2 if t 1 [FK] = t 2 [PK]. A referential integrity constraint can be displayed in a relational database schema as a directed arc from R 1.FK to R 2. 21
5.5 22
23
24
25
Outline Relational Model Concepts Relational Model Constraints Relational Database and operations 26
Relational Databases and Relational Database Schemas Relational database schema S Set of relation schemas S = {R 1, R 2,..., R m } Set of integrity constraints IC Relational database state Set of relation states DB = {r 1, r 2,..., r m } Each r i is a state of R i such that the r i relation states satisfy integrity constraints specified in IC Invalid state Does not obey all the integrity constraints Valid state Satisfies all the constraints in the defined set of integrity constraints IC 27
Operations in a Relational Database Basic operations that change the states of relations in the database: Insert Delete Update (or Modify) 28
The Insert Operation Provides a list of attribute values for a new tuple t to be inserted into a relation R Can violate any of the four types of constraints Default option is to reject the insertion 29
The Delete Operation Can violate only referential integrity If tuple being deleted is referenced by foreign keys from other tuples, e.g. delete a tuple from department Restrict Reject the deletion Cascade Propagate the deletion by deleting tuples that reference the deleted tuple Set null or set default Modify the referencing attribute values that cause the violation 30
The Update Operation Necessary to specify a condition on attributes of relation Select the tuple (or tuples) to be modified If attributes to be updated not part of a primary key nor of a foreign key Usually causes no problems Updating a primary/foreign key Similar issues as with Insert/Delete 31
In-Class Exercise Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(SSN, Course#, Quarter, Grade) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) TEXT(Book_ISBN, Book_Title, Publisher, Author) Draw a relational schema diagram specifying the foreign keys for this schema. 32