Physical Database Design

Similar documents
OBJECTIVES. How to derive a set of relations from a conceptual data model. How to validate these relations using the technique of normalization.

CS317 File and Database Systems

COMP102: Introduction to Databases, 14

Conceptual Database Design

CS317 File and Database Systems

Lecture 03. Spring 2018 Borough of Manhattan Community College

Transforming ER to Relational Schema

Lecture 03. Fall 2017 Borough of Manhattan Community College

Step 4: Choose file organizations and indexes

3ISY402 DATABASE SYSTEMS

Inputs. Decisions. Leads to

Lecture 07. Spring 2018 Borough of Manhattan Community College

CS317 File and Database Systems

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

STRUCTURED QUERY LANGUAGE (SQL)

Example 1 - Create Horizontal View. Example 2 - Create Vertical View. Views. Views

CS317 File and Database Systems

Database system development lifecycles

Chapter 6. SQL: SubQueries

Database Architectures

DB Creation with SQL DDL

CMP-3440 Database Systems

Database Architectures

SOLUTIONS TO REVIEW QUESTIONS AND EXERCISES FOR PART 3 - DATABASE ANALYSIS AND DESIGN (CHAPTERS 10 15)

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Distributed KIDS Labs 1

Handout 6 CS-605 Spring 18 Page 1 of 7. Handout 6. Physical Database Modeling

Standard Query Language. SQL: Data Definition Transparencies

Lecture 5 Data Definition Language (DDL)

Physical DB design and tuning: outline

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Lecture 6 Structured Query Language (SQL)

7. Query Processing and Optimization

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

CS317 File and Database Systems

Database Systems. A Practical Approach to Design, Implementation, and Management. Database Systems. Thomas Connolly Carolyn Begg

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

File Structures and Indexing

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Advanced Databases: Parallel Databases A.Poulovassilis

Greenplum Architecture Class Outline

Physical Database Design and Tuning

CS317 File and Database Systems

CS317 File and Database Systems

Problem Set 2 Solutions

Chapter 6. SQL Data Manipulation

Single Record and Range Search

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Database Systems CSE 414

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Objective. The goal is to review material covered in Chapters 1-5. Do the following questions from the book.

Step 1: Create and Check ER Model

File Processing Approaches

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

LECTURE1: PRINCIPLES OF DATABASES

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Database Systems CSE 414

CMP-3440 Database Systems

CS317 File and Database Systems

VU Mobile Powered by S NO Group All Rights Reserved S NO Group 2013

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Relational Data Model ( 관계형데이터모델 )

Lecture4: Guidelines for good relational design Mapping ERD to Relation. Ref. Chapter3

Lecture #16 (Physical DB Design)

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Physical DB Issues, Indexes, Query Optimisation. Database Systems Lecture 13 Natasha Alechina

Database Applications (15-415)

Database Technologies. Madalina CROITORU IUT Montpellier

Lecture2: Database Environment

1. Considering functional dependency, one in which removal from some attributes must affect dependency is called

Kathleen Durant PhD Northeastern University CS Indexes

CSIT5300: Advanced Database Systems

14 Index selection guidelines 12/08/17 11:42 PM. Index selection guidelines

Information Systems (Informationssysteme)

Elements of the E-R Model

It also performs many parallelization operations like, data loading and query processing.

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

Database Applications (15-415)

Full file at

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Entity Relationship Modeling

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Lecture 01. Fall 2018 Borough of Manhattan Community College

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Course Outline Faculty of Computing and Information Technology

Avancier Methods (AM) From logical model to physical database

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Chapter 3. The Relational database design

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Basant Group of Institution

Chapter 12: Indexing and Hashing. Basic Concepts

Mahathma Gandhi University

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Welcome to CO 572: Advanced Databases

Transcription:

Physical Database Design January 2007 Yunmook Nah Department of Electronics and Computer Engineering Dankook University

Physical Database Design Methodology - for Relational Databases - Chapter 17 Connolly & Begg

Steps for Physical Database Design 3. Translate logical data model for target DBMS 3.1 Design base relations 3.2 Design representation of derived data 3.3 Design general constraints 4. Design file organizations and indexes 4.1 Analyze transactions 4.2 Choose file organizations 4.3 Choose indexes 4.4 Estimate disk space requirements 5. Design user views 6. Design security mechanisms 7. Consider the introduction of controlled redundancy 8. Monitor and tune the operational system

3. Translate logical data model for target DBMS 3.1 Design base relations Implement base relations Document design of base relations 3.2 Design representation of derived data Derived or calculated attributes The number of staff who work in a particular branch The number of properties that a member of staff handles Document design of derived data 3.3 Design general constraints The remaining general constraints DreamHome has a rule that prevents a member of staff from managing more than 100 properties Document design of general constraints

4. Design file organizations and indexes 4.1 Analyze transactions Performance criteria The transactions the run frequently The transactions that are critical The times during the day/week when there will be a high demand At least investigate the most important ones Map all transaction paths to relations Table 17.1: Transaction/relation cross-reference matrix Determine which relations are most frequently accessed by transactions Figure 17.3: Transaction usage map

Analyze the data usage of selected transactions that involve these relations For each transaction, we should determine: The relations and attributes accesses by the transaction and the type of access The attributes used in any predicates For a query, the attributes that are involved in the join of two or more relations The expected frequency at which the transaction will run The performance goals for the transaction

4.2 Choose file organizations Selecting a file organization (if possible) Heap Hash Indexed sequential access method (ISAM) B + -tree Clusters

4.3 Choose indexes Specifying indexes CREATE [UNIQUE] INDEX Choosing secondary indexes The PropertyForRent relation Primary index: propertyno Secondary index: rent attribute Guidelines for choosing a wish-list of indexes (pp.509-510) Do not index small relations Avoid indexing an attribute or relation that is frequently updated Avoid indexing attributes that consist of long character strings

Removing indexes from the wish-list Consider the impact of each of these on update transactions Some systems allow users to inspect the optimizer s strategy for executing a particular query or update, sometimes called the Query Execution Plan Access: Performance Analyzer Oracle: EXPLAIN PLAN diagnostic utility DB2: EXPLAIN utility INGRES: online QEP0-viewing utility When a query runs slower than expected It is worth using such a facility to determine the reason for the slowness Updating the database statistics Document choice of indexes

File organizations and indexes for DreamHome with Microsoft Office Access (pp.511-513) Table 17.3 File organizations and indexes for DreamHome with Oracle (pp.513-514) Table 17.4

4.4 Estimate disk space requirements Highly depend on the target DBMS and the hardware used to support the database Based on the size of each tuple and the number of tuples in the relation

5. Design user views CREATE VIEW Document design of user views 6. Design security mechanisms System security vs data security GRANT, REVOKE Document design of security measures

Monitoring and Tuning the Operational System Chapter 18 Connolly & Begg

7. Consider the introduction of controlled redundancy 8. Monitor and tune the operational system

7. Consider the introduction of controlled redundancy Denormalization Speed up retrievals but slows down updates Example Branch (branchno, street, city, postcode, mgrstaffno) Branch (branchno, street, postcode, mgrstaffno), Postcode (postcode, city) Consider duplicating certain attributes or joining relations together To reduce the number of joins required to perform a query

7. Consider the introduction of controlled redundancy 7.1 Combining 1:1 relationships 7.2 Duplicating non-key attributes in 1:* relationships to reduce joins 7.3 Duplicating FK attributes in 1:* relationships to reduce joins 7.4 Duplicating attributes in *:* relationships to reduce joins 7.5 Introducing repeating groups 7.6 Creating extract tables 7.7 Partitioning relations

7. Consider the introduction of controlled redundancy Example relation and data: Figure 18.1 7.1 Combining 1:1 relationships Combined Client and Interview: Figure 18.2 There will be significant number of nulls 7.2 Duplicating non-key attributes in 1:* relationships to reduce joins Include lname of PrivateOwner in the PropertyForRent relation: Figure 18.3 Need update propagation Increase in storage space

A special case of 1:* relationship [pp.524-525] Lookup table (reference table, pick list, code table) Contains a code and a description Figure 18.4: PropertyType (type, description) Advantages Reduction in the relation size Easier to change the description Lookup table can be used to validate user input If the lookup table is used in frequent or critical queries, and the description is unlikely to change, consideration should be given to duplicating the description attribute Figure 18.5

7.3 Duplicating FK attributes in 1:* relationships to reduce joins Q: List all the private property owners at a branch Duplicating the FK branchno in the PrivateOwner relation: Figure 18.6 If an owner could rent properties through many branches, the above change would not work Necessary to model a *:* relationship between Branch and PrivateOwner

7.4 Duplicating attributes in *:* relationships to reduce joins N:M -> need three way join It may be possible to reduce the number of relations to be joined e.g., duplicate the street attribute in the intermediate Viewing relation [p.527] Figure 18.7

7.5 Introducing repeating groups Reintroducing repeating groups By introducing multiple attributes Figure 18.8: Branch(, telno1, telno2, telno3) 7.6 Creating extract tables Create and populate the tables (for reports) in an overnight batch run DW

7.7 Partitioning relations Decompose very large relations (and indexes) into a number of smaller and more manageable pieces called partitions Horizontal, vertical: Figure 18.9 Example ArchivedPropertyForRent relation with several hundreds of thousands of tuples Hash partition in Oracle: Figure 18.10 Partition types Hash Range: based on a range of values List: based on a list of values Composite: range-hash, list-hash

Advantages Improved load balancing Improved performance Increased availability Improved recovery Security Disadvantages Complexity Reduced performance Duplication

7. Consider the introduction of controlled redundancy Consider implications of denormalization How data integrity will be maintained (after denormalization or duplication) Triggers: the best solution Transactions Batch reconciliation Advantages and disadvantages of denormalization Table 18.1 Document introduction of redundancy

8. Monitor and tune the operational system Factors to measure efficiency Transaction throughput Response time Disk storage Benefits from tuning Avoid the procurement of additional hardware Possible to downsize the HW configuration faster response time and better throughput

Understanding system resources Main memory CPU Disk I/O Network Document tuning activity New requirements for DreamHome Necessary to handle changing requirements Ability to hold pictures of the properties for rent: Figure 18.12 Ability to publish a report describing properties available for rent on the Web