CPS510 Database System Design Primitive SYSTEM STRUCTURE Naïve Users Application Programmers Sophisticated Users Database Administrator DBA Users Application Interfaces Application Programs Query Data Base Schema Data Manipulation Language Pre-compiler (DML) Query Processor Data Definition Language Compiler (DDL) Application Programs Object Code Database Management System Data Base Manager Other Services Other OS Services Memory Manager Operating System File Manager Other Files Data Files Disk Storage - Physical Data Dictionary
Introduction A database can be defined as a set of Master files organized and administered in a flexible way, so that the files in the database can be easily adapted to new unforeseen tasks! Relational Database Management Systems Hierarchical Database Management Systems Network Database Management Systems Inverted Files DBMS There are 3 forms of database description - the ANSI/SPARC, 1975 (and so on...) 1. Conceptual Schema (Conceptual View) Machine-and-software independent description of the total database. It is also referred to as a logical database. 2. Internal Schema (Internal View) Description of the physical database! It is close to the machine level and describes such things as file organization and access paths. 3. External Schema (User View) User-oriented description of part of the database. It corresponds to a way in which a program needs to "view" the database. Since there are many purposes to which the data in a database maybe put, there will be many different external schemas (or schemata) corresponding to different programs interpreting the database in particular ways. A DBMS is software used to read/write, maintain, and provide among other things, security and integrity for the data.
Facilities Standard methods used by DBMSs to implement relationships Data dictionary, docs, etc... Data independence (i.e. the possibility of changing the physical database without having to make alterations to old programs that operate on the database.) Example: consider the File: CODE NAME ADDRESS TELEPHONE SALARY Database languages (QUEL, SQL, etc) Report generators/screen generators Recovery facilities Concurrency facilities File protection Elements of the Conceptual Schema Definition: A conceptual schema maybe thought of as a model of the enterprise using it. The Description of the Conceptual Schema may include a description of: 1. The kinds of logical files and record types comprising the db. 2. The fields included in the record type 3. The relationships between the different record types of the db. 4. Any limitations in the values that can be taken by individual fields, as well as, constraints upon the relationships between records.
Entities/Attributes An Entity is an item or a concept in the real world about which we want to report data. The data associated with an Entity is called Attribute (field names in a record). An entity type is a classification of all the entities describing the same type of item or concept from the real world. A description of an Entity type in the conceptual database includes: An unique name for each entity type. A field containing a unique identifier for each individual Entity called a key. A description of all attributes or fields of the entity type. An indication of the number of occurrences of an entity type the Database will be required to hold - known as the cardinal number of the entity type. Some attributes which maybe in an entity Mandatory or optional depending on whether fields should or should not always have a value. Single-valued or multi-valued. Multi-valued Attributes are stored in repeating fields Aggregate or simple. Corresponding to whether an attribute is formed from a combination of other attributes or NOT. Relationships Relationship: is used to describe a connection between Entities. Relation: is used to designate a logical table describing a set of similar Entities. (File = Relation = Table)
Data Models A data model is a method used to describe the Entity types and relationships of the Conceptual Database. E-R Entity / Relationship (or E-R-A) The E-R is used for top-down analysis of new systems. E-R Diagrams (Frank (1971) Chen (1976)) Bachman Diagrams (1969) or Data Structure Diagrams Relational Model (see the paradigm) The Relational Model for bottom-up of existing Relations User View Diagrams for bottom-up also. E r Examples E-R Bachman n m E 1 r 2 E 2 1 m E 1 r 2 E 2 E E 1 m r Tree n m r Graph/Network 1 1 E 1 r 2 E 2 E 1 r E 2 Order 3 E 3
The Design of the Conceptual Schema User Requirements E-R design of the DB Real World Stage 1 Choice of a Model Conceptual schema independent of concrete DBMS Normalization Stage 2 Normalization Adjusted diagrams and normalized Relational model Logical Optimization and Adjustments to a concrete DBMS Stage 3 Optimization Data Dictionary Database Description
Normal forms Normalization Formulate constraints on the structures of table, in the database to obtain a logical database that is like the "External world". By applying different sets of constraints results in differently structured tables and normal forms. Terminology Relation = table = file Tuple = record Attribute = a field Domain = field's value area Example: A sales person file (Before normalization) SNR City Code PNR Qty PNR Qty PNR Qty PNR Qty PNR Qty S1 Athens 10 --- --- S2 Toronto 30 P1 200 P3 100 S4 Kingston 20 P5 200 P8 100 S5 Toronto 30 P1 50 P3 500 P4 800 P5 500 P8 1000 1NF (First Normal Form - FNF) Each tuple must contain a unique identifier. Tuples have only atomic values in their A's (i.e. repeating groups are excluded)
The file after normalization. 1NF SNR City Code PNR Qty S1 Athens 10 --- --- S2 Toronto 30 P1 200 S2 Toronto 30 P2 100 S4 Kingston 20 P5 200 S4 Kingston 20 P8 100 S5 Toronto 30 P1 50 S5 Toronto 30 P3 500 S5 Toronto 30 P4 800 S5 Toronto 30 P5 500 S5 Toronto 30 P8 1000 Functional Dependency Assume that X and Y are fields in the same record (SC, City). Field Y is said to be functionally dependent on field X, if and only if, for all pairs of records in the file, that if they have the same value in field X, then they also have in field Y. Field Y is said to be fully functionally dependent on X if Y is functionally dependent on X, and NOT functionally dependent on any subset of X's possible subfields. A field on which another field is fully functional dependent is called the determinant for that field. Field Y is said to be transitively functionally dependent on X if there is some other field Z such that X determines the value of Z and Z determines the value of Y.
2NF (Second Normal Form - SNF) A relation is in 2NF if it is in 1NF and if all non-identifier fields are fully functional dependent on the record's identifier Example given with SC and City or with SNR and Name SNR S1 S2 S4 S5 Name CB BB RR PO 3NF (Third Normal Form) The relation is in 3NF if it is in 2NF and the record's non-identifier fields are NOT transitively dependent on the record's id field. The non-key fields must contain data that is attached to the identifier field, to the entire identifier and nothing but the identifier field. (this is omitted from the 2NF) Example The sales person table is represented as two tables satisfying the 3NF provided that SC (Sales Code) and City are independent. a) b) SNR PNR Qty S2 P1 200 S2 P2 100 S4 P5 200 S4 P8 100 SNR SC City S1 10 Athens S2 30 Toronto S4 20 Kingston S5 30 Toronto S5 P1 50 S5 P3 500 S5 P4 800 S5 P5 500 S5 P8 1000
3.5NF (Boyce/Codd Normal Form B/C NF) This NF makes use of the determinant. A field is the Determinant for other fields in the tuple, if all the fields make up a description of a type of object or concept from the external world in such a way that the determinant can be used as id key field for the new type of object or concept. Now we can define a relation in B/C NF as one in which any determinant attribute can be used as the tuples' id field. Note: The 3NF contains a constraint which the B/C does not: A relation is in the B/C normal form may well contain several different unique id fields, which the math derivation of the 3NF does not allow! The 3.5NF is more practical than the 3NF. 4NF (Fourth Normal Form) Multi-valued dependency (between attributes) It holds between two A's in a table if the second A can assume different values for a given value in the first A. If a relation contains two multi-value dependencies, these may depend on or be independent of each other, respectively. A relationship of order n, (please see order 3 in the E-R diagram example), will usually include many multi-valued dependencies (MD). If these are independent of each other, the relationship of order n can be reduced to l:n relationships and n:m relationships. A Table satisfies the 4NF if it satisfies the 3NF and if the table contains several MDs which are dependent of each other. If a table does not satisfy the 4NF because it may contain two independent MDs, then the table can be normalized by splitting it up into two different tables, each of which contains one of the MDs. Note: if the MDs are dependent on each other, the tables normally cannot be split. Problems with maintenance and updating! 5NF (Fifth Normal Form) Theoretical value. A relation R is in 5NF if and only if every join dependency (projection - join NF). Possible to reduce a R containing two MDs which are dependent on each other
Example: Relationship of order 3 Assume that a database contains a relation for each one of: vehicle dealers, vehicle manufacturers, vehicle types. Note: There is an N:M relationship between dealers and manufacturers. And a N:M relationship between dealers and types a) Show that C.B. and M.M. each sell Ford and GM products. b) Show that C.B. sells cars and buses and M.M. sells cars and trucks. Dealer Manufacturer C.B. Ford C.B. GM M.M. Ford M.M. GM Dealer C.B. C.B. M.M. M.M. Type Car Bus Car Truck a) MD between dealer and manufacturer. b) MD between dealer and type. The 2 tables can be implemented as a simple table with three columns. The MDs are in the same table. The 2 MDs are independent of each other if the "contents" of the first row mean that C.B. sells only Ford cars. if the MDs are independent, it is possible that C.B. doesn't sell Ford cars, only Ford buses if the MD are dependent, C.B. sells Ford's cars. 4NF Dealer Manufacturer Type C.B. Ford Car C.B. Ford Bus C.B. GM Car C.B. GM Bus M.M. Ford Car M.M. Ford Bus M.M. GM Car M.M. GM Bus 5NF Dealer Manufacturer Dealer Type C.B. Ford C.B. Car C.B. GM C.B. Bus M.M. Ford M.M. Truck Manufacturer Type Ford Car Ford Bus GM Car GM Truck
Data Definition in DB2 The principal data definition statements are: CREATE TABLE ALTER TABLE DROP TABLE CREATE VIEW DROP VIEW CREATE INDEX DROP INDEX CREATE TABLE There are two formats for the CREATE TABLE statement: e.g. e.g. 1. CREATE TABLE table-name (column-definition [, column-definition ]... [, primary-key-definition] [, alternate-key-definition [, alternate-key-definition]...]) [, foreign-key-definition, [foreign-key-definition...j) [ other parameters]; where a column-definition is column data-type [NOT NULL [ WITH DEFAULT I UNIQUE]] CREATE TABLE S S# CHAR(5) NOT NULL, SNAME CHAR(20) NOT NULL WITH DEFAULT, STATUS SMALLINT NOT NULL WITH DEFAULT, CITY CHAR(15) NOT NULL WITH DEFAULT, PRIMARY KEY (S#)); 2. CREATE TABLE table-name LIKE table [other parameters]; This format allows the user to create a table with, the same "shape" as another. The new table inherits only the column definitions from the old one. CREATE TABLE SCOPY LIKE S This would generate a table identical to a table generated with the following CREATE TABLE statement:
CREATE TABLE SCOPY S# CHAR(5) NOT NULL, SNAME CHAR(20) NOT NULL WITH DEFAULT, STATUS SMALLINT NOT NULL WITH DEFAULT, CITY CHAR(15) NOT NULL WITH DEFAULT); Note that the new table does not inherit any primary, alternate, or foreign key definitions. Nor would it inherit any UNIQUE specifications. DB2 does not allow any such specifications to be stated explicitly either. ALTER TABLE A new column can be added to a table at any time using the ALTER TABLE command: e.g. ALTER TABLE table-name ADD column data-type [NOT NULL WITH DEFAULT]; ALTER TABLE S ADD DISCOUNT SMALLINT DROP TABLE An existing table can be destroyed at any time by means of the DROP TABLE statement: DROP TABLE table-name; Foreign Key, and Referential Integrity in DB2 The referential integrity rule states that the database must not contain any unmatched foreign key values. That is non-null foreign keys for which there does not exist a matching value of the corresponding primary key are not allowed. The syntax of a foreign key definition is as follows: FOREIGN KEY [foreign-key] (column [,column]...]), REFERENCES table [ON DELETE effect] where effect is RESTRICT, CASCADE, or SET NULL.
e.g. CREATE TABLE SP ( S# CHAR(5) NOT NULL, P# CHAR( 6) NOT NULL, QTY INTEGER, PRIMARY KEY (S#, P#), FOREIGN KEY SFK (S#) REFERENCES S ON DELETE CASCADE FOREIGN KEY PFK (P#) REFERENCES P ON DELETE RESTRICT); The ON DELETE clause defines the delete rule for the target table with respect to this foreign key; that is, it defines what happens if an attempt is made to delete a row from the target table. RESTRICT: the delete is restricted to the case where there are no matching rows in table T2 (it is rejected is any such rows exist). CASCADE: the delete cascades to delete all matching rows in table T2 also. Note: if the key in T2 references yet another table T3, the delete rule for that key is applied as well. That is a single delete statement can cascade through a large number of tables if you are not careful. SET NULL: the foreign key must have NULLs allowed. The target row is deleted and the foreign key is set to NULL in all matching rows in table T2. INDEXES The CREATE INDEX takes the general form: e.g. CREATE [UNIQUE] INDEX index ON table-name (column [order] [, column [order]]...) [other parameters] ; CREATE INDEX X ON T(P, Q, DESC, R); This creates an index called X on table T in which entries are ordered by ascending R-value, within descending Q-value and within ascending P-value. The columns P, Q and R need not be contiguous, nor need be all the same data type, nor need they all be fixed or varying length. The UNIQUE option specifies that no two rows in the indexed tables will be allowed to take on the same values for the indexed column or column combinations at the same time. Indexes can be dropped by issuing a DROP INDEX command. e.g. DROP INDEX X
Physical Database Structures File Organizations Sequential Organization: Records are stored according to a fixed sequence. Random Organization: Records are retrieved by transforming the id field (key) to a block address. Index Organization: Records can be searched through an index that contains references to the records in a file. List Organization: There are various forms. Usually, records are chained together by pointer fields. Note: There are primary and secondary File Organizations: Primary is based on the physical storage of the individual records, but the secondary is not. Keys Super Key: An attribute (or combination of attribute) that uniquely identifies each entity. Candidate Key: A minimal super key that does not contain a subset of attributes that itself is a super key Primary Key: A candidate key selected to uniquely identify all other attribute values in any given row. It cannot contain null values. (chosen by the DB designer). Secondary Key: An attribute (or combination of attributes) used strictly for data retrieval purposes. Foreign Key: An attribute (or combination of attributes) in a table whose value must either match the primary key in another table or be null Entity Sets Weak: An entity set does not have sufficient attributes to form a primary key. Strong: An entity set which has as a primary key. Integrity Rules Entity integrity: No null values in primary key guarantees that each entity will have a unique identity. Referential Integrity: Foreign key should match another primary key or be null. Makes it possible for an Attribute NOT to have a corresponding Attribute, but it will still be impossible to have an invalid entry.