Physical Database Design January 2007 Yunmook Nah Department of Electronics and Computer Engineering Dankook University
Physical Database Design Methodology - for Relational Databases - Chapter 17 Connolly & Begg
Steps for Physical Database Design 3. Translate logical data model for target DBMS 3.1 Design base relations 3.2 Design representation of derived data 3.3 Design general constraints 4. Design file organizations and indexes 4.1 Analyze transactions 4.2 Choose file organizations 4.3 Choose indexes 4.4 Estimate disk space requirements 5. Design user views 6. Design security mechanisms 7. Consider the introduction of controlled redundancy 8. Monitor and tune the operational system
3. Translate logical data model for target DBMS 3.1 Design base relations Implement base relations Document design of base relations 3.2 Design representation of derived data Derived or calculated attributes The number of staff who work in a particular branch The number of properties that a member of staff handles Document design of derived data 3.3 Design general constraints The remaining general constraints DreamHome has a rule that prevents a member of staff from managing more than 100 properties Document design of general constraints
4. Design file organizations and indexes 4.1 Analyze transactions Performance criteria The transactions the run frequently The transactions that are critical The times during the day/week when there will be a high demand At least investigate the most important ones Map all transaction paths to relations Table 17.1: Transaction/relation cross-reference matrix Determine which relations are most frequently accessed by transactions Figure 17.3: Transaction usage map
Analyze the data usage of selected transactions that involve these relations For each transaction, we should determine: The relations and attributes accesses by the transaction and the type of access The attributes used in any predicates For a query, the attributes that are involved in the join of two or more relations The expected frequency at which the transaction will run The performance goals for the transaction
4.2 Choose file organizations Selecting a file organization (if possible) Heap Hash Indexed sequential access method (ISAM) B + -tree Clusters
4.3 Choose indexes Specifying indexes CREATE [UNIQUE] INDEX Choosing secondary indexes The PropertyForRent relation Primary index: propertyno Secondary index: rent attribute Guidelines for choosing a wish-list of indexes (pp.509-510) Do not index small relations Avoid indexing an attribute or relation that is frequently updated Avoid indexing attributes that consist of long character strings
Removing indexes from the wish-list Consider the impact of each of these on update transactions Some systems allow users to inspect the optimizer s strategy for executing a particular query or update, sometimes called the Query Execution Plan Access: Performance Analyzer Oracle: EXPLAIN PLAN diagnostic utility DB2: EXPLAIN utility INGRES: online QEP0-viewing utility When a query runs slower than expected It is worth using such a facility to determine the reason for the slowness Updating the database statistics Document choice of indexes
File organizations and indexes for DreamHome with Microsoft Office Access (pp.511-513) Table 17.3 File organizations and indexes for DreamHome with Oracle (pp.513-514) Table 17.4
4.4 Estimate disk space requirements Highly depend on the target DBMS and the hardware used to support the database Based on the size of each tuple and the number of tuples in the relation
5. Design user views CREATE VIEW Document design of user views 6. Design security mechanisms System security vs data security GRANT, REVOKE Document design of security measures
Monitoring and Tuning the Operational System Chapter 18 Connolly & Begg
7. Consider the introduction of controlled redundancy 8. Monitor and tune the operational system
7. Consider the introduction of controlled redundancy Denormalization Speed up retrievals but slows down updates Example Branch (branchno, street, city, postcode, mgrstaffno) Branch (branchno, street, postcode, mgrstaffno), Postcode (postcode, city) Consider duplicating certain attributes or joining relations together To reduce the number of joins required to perform a query
7. Consider the introduction of controlled redundancy 7.1 Combining 1:1 relationships 7.2 Duplicating non-key attributes in 1:* relationships to reduce joins 7.3 Duplicating FK attributes in 1:* relationships to reduce joins 7.4 Duplicating attributes in *:* relationships to reduce joins 7.5 Introducing repeating groups 7.6 Creating extract tables 7.7 Partitioning relations
7. Consider the introduction of controlled redundancy Example relation and data: Figure 18.1 7.1 Combining 1:1 relationships Combined Client and Interview: Figure 18.2 There will be significant number of nulls 7.2 Duplicating non-key attributes in 1:* relationships to reduce joins Include lname of PrivateOwner in the PropertyForRent relation: Figure 18.3 Need update propagation Increase in storage space
A special case of 1:* relationship [pp.524-525] Lookup table (reference table, pick list, code table) Contains a code and a description Figure 18.4: PropertyType (type, description) Advantages Reduction in the relation size Easier to change the description Lookup table can be used to validate user input If the lookup table is used in frequent or critical queries, and the description is unlikely to change, consideration should be given to duplicating the description attribute Figure 18.5
7.3 Duplicating FK attributes in 1:* relationships to reduce joins Q: List all the private property owners at a branch Duplicating the FK branchno in the PrivateOwner relation: Figure 18.6 If an owner could rent properties through many branches, the above change would not work Necessary to model a *:* relationship between Branch and PrivateOwner
7.4 Duplicating attributes in *:* relationships to reduce joins N:M -> need three way join It may be possible to reduce the number of relations to be joined e.g., duplicate the street attribute in the intermediate Viewing relation [p.527] Figure 18.7
7.5 Introducing repeating groups Reintroducing repeating groups By introducing multiple attributes Figure 18.8: Branch(, telno1, telno2, telno3) 7.6 Creating extract tables Create and populate the tables (for reports) in an overnight batch run DW
7.7 Partitioning relations Decompose very large relations (and indexes) into a number of smaller and more manageable pieces called partitions Horizontal, vertical: Figure 18.9 Example ArchivedPropertyForRent relation with several hundreds of thousands of tuples Hash partition in Oracle: Figure 18.10 Partition types Hash Range: based on a range of values List: based on a list of values Composite: range-hash, list-hash
Advantages Improved load balancing Improved performance Increased availability Improved recovery Security Disadvantages Complexity Reduced performance Duplication
7. Consider the introduction of controlled redundancy Consider implications of denormalization How data integrity will be maintained (after denormalization or duplication) Triggers: the best solution Transactions Batch reconciliation Advantages and disadvantages of denormalization Table 18.1 Document introduction of redundancy
8. Monitor and tune the operational system Factors to measure efficiency Transaction throughput Response time Disk storage Benefits from tuning Avoid the procurement of additional hardware Possible to downsize the HW configuration faster response time and better throughput
Understanding system resources Main memory CPU Disk I/O Network Document tuning activity New requirements for DreamHome Necessary to handle changing requirements Ability to hold pictures of the properties for rent: Figure 18.12 Ability to publish a report describing properties available for rent on the Web