Handout 6 CS-605 Spring 18 Page 1 of 7 Handout 6 Physical Database Modeling Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create a design for storing data that will provide data integrity adequate performance security Inputs Normalized relations Volume estimates Attribute definitions Response time expectations Data security needs Backup/recovery needs Integrity expectations DBMS technology used Leads to Decisions Attribute data types Physical record descriptions (doesn t always match logical design) File organizations Indexes and database architectures Query optimization - 1 -
Handout 6 CS-605 Spring 18 Page 2 of 7 Technical specification: example: SQL for creating tables Employee(emp_no, SSN, first_name, last_name, street,city,state, date_hired, hourly_wage) EmployeeHours ( emp_no, week_ending, hours_worked) CREATE TABLE EMPLOYEE (emp_no varchar(3) NOT NULL, SSN char(9) NOT NULL, first_name varchar(15) NOT NULL, last_name varchar(20) NOT NULL, address varchar(30), city varchar(15), state varchar(2) NOT NULL, date_hired date, hourly_wage number(5,2) NOT NULL, CONSTRAINT constr_emp_pk PRIMARY KEY(emp_no), CONSTRAINT constr_state_ch CHECK (state in ( NY, RI, NJ )), CONSTRAINT constr_hourly_ch CHECK (hourly_wage >= 7.50 AND hourly_wage <= 50) CONSTRAINT constr_unique_ssn UNIQUE (SSN) ); CREATE TABLE EMPLOYEE_HOURS (emp_no varchar(3) NOT NULL, week_ending date NOT NULL, hours_worked number(3,0) DEFAULT 0, CONSTRAINT hours_pk PRIMARY KEY (emp_no, week_ending), CONSTRAINT emp_fk FOREIGN KEY (emp_no) REFERENCES EMPLOYEE (emp_no) ON DELETE CASCADE ) Concerns: 1. Choosing the field data type: Seek to: Minimize storage space Varchar vs. Char, Number vs Number (5,2) Represent all possible values e.g., floating point vs. integer value See Oracle documentation site on built in data types: http://download.oracle.com/docs/cd/b14117_01/server.101/b10759/sql_elements001.htm#sthref51 2. Enforcing integrity constraints See Oracle documentation site on Constraints: Descriptions of Constraints http://download.oracle.com/docs/cd/b14117_01/server.101/b10759/clauses002.htm#g1053592-2 -
Handout 6 CS-605 Spring 18 Page 3 of 7 Examples of Constraints: http://download.oracle.com/docs/cd/b14117_01/server.101/b10759/clauses002.htm#g1063310 Range control allowable value limitations (constraints or validation rules) e.g., value <=100 for Test_Score field (CHECK constraint) Uniqueness of certain attributes or attribute combinations (UNIQUE constraint) e.g. when replacing a composite natural key with a surrogate, still want to enforce the uniqueness of the values of natural key. Default value - assumed value if no explicit value (DEFAULT) e.g., value MA for State field Null value control allowing or prohibiting empty fields (NOT NULL) e.g., prohibit leaving Date_of_Birth field blank Referential integrity range control (and null value allowances) for foreign-key to primary-key match-ups (FOREIGN KEY) ON DELETE clause: Default behavior reject deletion from the primary table if there are records which refer to it via a Foreign key constraint. Other options: ON DELETE CASCADE (deletes all referring records as well) ON DELETE SET NULL (set foreign key values to NULL) Ensuring Field Data Efficiency and Integrity: Coding values e.g., BOS(Logan Airport), ORD(Chicago-O Hara), implement by creating a look-up table that you must stored and access to look up the code value Example below: Instead of table Product (Product_No, Description, FinishValue) have two tables Product and Finish, so that the finish values are coded. Create table Finish ( Code char(1) PRIMARY KEY, Value varchar(20) UNIQUE ); Create table Product( ProductNO char(5) PRIMARY KEY, ProdDescription varchar (15) NOT NULL, ProdFinish char(1), CONSTRAINT Finish_FK FOREIGN KEY(ProdFinish) REFERENCES Finish(Code) ); 2.Performance - Minimizing Data access time.: - 3 -
Handout 6 CS-605 Spring 18 Page 4 of 7 Predicting/analyzing access: composite usage maps: Show approximate data volumes (number of records) Approximate access frequencies (per hour) (Fig 5.1 from MDM) Goal minimize the number of page retrievals (I/O operation) per query. Layout of Physical Records: Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit. Page: The amount of data read or written in one I/O operation. Blocking Factor: The number of physical records per page. General rule of thumb: store data that gets accessed together physically close together. attributes order of fields and denormalization, vertical partitioning rows - horizontal partitioning, tables - clustering Denormalization Transforming normalized relations into unnormalized physical record specifications to minimize access time by avoiding table joins Benefits: Can dramatically improve performance (speed) be reducing number of table lookups (i.e. reduce number of necessary table joins) Costs (due to data duplication) Wasted storage space Data integrity/consistency threats from anomalies - 4 -
Handout 6 CS-605 Spring 18 Page 5 of 7 Common denormalization opportunities One-to-one relationship, Fig 5.3 Reference data (1:N relationship where 1-side has data not used in any other relationship) Many-to-many relationship with attributes Privacy and Security: Views Views are virtual tables dynamically composed from the actual tables - combine information from multiple tables - restrict access to data to certain tables/columns - read-only - 5 -
Handout 6 CS-605 Spring 18 Page 6 of 7 Privacy and Security, Performance: Partitioning: Horizontal partitioning (example) Distributing the rows of a table into two or more separate files based on the value of a certain attribute: - e.g., Customer table is partitioned into four separate files, one for each geographical region - Vertical partitioning (example) - Distributing the columns of a table into two or more separate files e.g., Employee table is partitioned into public file (name, office, extension, etc.) and private file (salary, health history, etc.) - Note: the primary key is repeated in each file Benefits of partitioning: Records used together are grouped together Each partition can be optimized for performance Security and recovery each partition can be only made available to relevant users. Partitions stored on different disks: less contention Parallel processing capability Disadvantages of partitioning: Slower retrievals when across partitions Inconsistent access speed across partitions Table clustering Performance: Indexing Table records can be placed physically next to each other in a file to facilitate the fast retrieval of relevant fields from more than one table. Index concept is like index in a book. Index a separate structure that contains references of records for quick retrieval. See example: Easier to locate all records with given last name using the index - 6 -
Handout 6 CS-605 Spring 18 Page 7 of 7 Example showing two different indeces Primary keys are automatically indexed. Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types. Adding a record requires at least two disk accesses: - Update the file - Update the index! Trade-off: - Faster queries, but - Slower maintenance (additions, deletions, and updates of records) => Thus, more static databases benefit more overall What indexes to create? - Indexes are most useful on larger tables - Index the primary key of each table (may be automatic) - Indexes are useful on the foreign key and search fields (WHERE) - Indexes are also useful on fields used for sorting (ORDER BY) and categorizing (GROUP BY) - Most useful to index on a field when there are many different values for that field (>100 use it, <30 do not index) Concerns: Indexes can slow the insertion/deletion/update operations down, so create indexes wisely. Depending on the DBMS, NULL values may not be referenced from an index (thus, rows with a null value in the field that is indexed may not be found by a search using the index) - 7 -