COMP102: Introduction to Databases, 14

Similar documents
CS317 File and Database Systems

OBJECTIVES. How to derive a set of relations from a conceptual data model. How to validate these relations using the technique of normalization.

Step 4: Choose file organizations and indexes

CS317 File and Database Systems

Physical Database Design

ECE 122. Engineering Problem Solving Using Java

COMP102: Introduction to Databases, 4

Chapter Five Physical Database Design

7. Query Processing and Optimization

COMP102: Introduction to Databases, 9.1


File Structures and Indexing

Chapter 5: Physical Database Design. Designing Physical Files

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Indexing: Overview & Hashing. CS 377: Database Systems

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

EASTERN ARIZONA COLLEGE Database Design and Development

Database Systems CSE 414

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

COMP 430 Intro. to Database Systems. Indexing

CMPSCI 105: Lecture #12 Searching, Sorting, Joins, and Indexing PART #1: SEARCHING AND SORTING. Linear Search. Binary Search.

COMP102: Introduction to Databases, 5 & 6

Database Architectures

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Database Systems CSE 414

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

Chapter 17 Indexing Structures for Files and Physical Database Design

SUMMARY OF DATABASE STORAGE AND QUERYING

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

Data on External Storage

Database Optimization

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Constraints. Primary Key Foreign Key General table constraints Domain constraints Assertions Triggers. John Edgar 2

Inputs. Decisions. Leads to

COMP102: Introduction to Databases, 13

Information Systems Development COMM005 (CSM03) Autumn Semester 2009

amiri advanced databases '05

Overview of Storage and Indexing

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Step 1: Create and Check ER Model

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Introduction to Database Systems CSE 344

CSIT5300: Advanced Database Systems

CSE 544 Principles of Database Management Systems

COMP102: Introduction to Databases, 1 & 2

Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Lecture 8 Index (B+-Tree and Hash)

Basant Group of Institution

Mahathma Gandhi University

Database Applications (15-415)

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

CSC 261/461 Database Systems Lecture 17. Fall 2017

Kathleen Durant PhD Northeastern University CS Indexes

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Introduction to Database Systems CSE 344

Unit 3 Disk Scheduling, Records, Files, Metadata

Introduction to Data Management. Lecture 21 (Indexing, cont.)

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

COMP102: Introduction to Databases, 20

DC62 DATABASE MANAGEMENT SYSTEMS DEC 2013

Database Systems CSE 414

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Chapter 12: Indexing and Hashing

Query Execution [15]

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer,

10 Database Design 1

CS34800 Information Systems

Physical Database Design and Performance (Significant Concepts)

Single Record and Range Search

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

Advanced Database Systems

Chapter 11: Indexing and Hashing

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

QUIZ: Is either set of attributes a superkey? A candidate key? Source:

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

Data about data is database Select correct option: True False Partially True None of the Above

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Data Organization B trees

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Darshan Institute of Engineering & Technology

Database Applications (15-415)

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Course Outline Faculty of Computing and Information Technology

Managing the Database

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Database Applications (15-415)

UNIT 4 DATABASE SYSTEM CATALOGUE

Optimization Overview

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Transcription:

COMP102: Introduction to Databases, 14 Dr Muhammad Sulaiman Khan Department of Computer Science University of Liverpool U.K. 8 March, 2011

Physical Database Design: Some Aspects

Specific topics for today: Purpose of physical database design. How to map the logical database design to a physical database design. How to design base tables for target DBMS. How to design representation of derived data. How to design business rules for target DBMS. How to analyze users transactions to determine characteristics that may impact performance. How to select appropriate file organizations based on analysis of transactions. When to select indexes to improve performance.

Logical vs Physical Database Design Logical DB design independent of implementation details, such as functionality of target DBMS. Logical DB design concerned with the what, physical database design is concerned with the how. Sources of information for physical design includes logical data model and data dictionary.

Physical Database Design Process of producing a description of implementation of the database on secondary storage. It describes: base tables, file organizations, and indexes used to achieve efficient access to the data, and any associated integrity constraints and security restrictions.

Overview of Physical Database Design Methodology Step 3: Translate logical database design for target DBMS. Step 4: Choose file organizations and indexes. Step 5: Design user views. Step 6: Design security mechanisms. Step 7: Consider introduction of controlled redundancy. Step 8: Monitor and tune operational system.

Step 3: Translate logical database design for target DBMS To produce a basic working relational database from the logical data model. Consists of the following steps: Step 3.1: Design base tables. Step 3.2: Design representation of derived data. Step 3.3: Design remaining business rules. Need to know functionality of target DBMS such as how to create base tables and whether DBMS supports the definition of: PKs, FKs, and AKs; required data i.e. whether system supports NOT NULL; domains; relational integrity rules; business rules.

Step 3.1: Design base tables To decide how to represent base tables identified in logical model in target DBMS. Need to collate and assimilate the information about tables produced during logical database design (from data dictionary and tables defined in DBDL). For each table, need to define: name of the table; list of simple columns in brackets; PK and, where appropriate, AKs and FKs. referential integrity constraints for any FKs identified. For each column, need to define: its domain, consisting of a data type, length, and any constraints on the domain; an optional default value for the column; whether the column can hold nulls.

Example: DBDL for the Branch table

Step 3.2: Design representation of derived data To design the representation of derived data in the database. Produce list of all derived columns from logical data model and data dictionary. Derived column can be stored in database or calculated every time it is needed. Option selected is based on: additional cost to store the derived data and keep it consistent with data from which it is derived; cost to calculate it each time it is required. Less expensive option is chosen subject to performance constraints.

Example: RentalAgreement and Member with derived column noofrentals:

Step 3.3: Design remaining business rules To design the remaining business rules for the target DBMS. Some DBMS provide more facilities than others for defining business rules. Example: When we create table RentalAgreement we can define: CONSTRAINT membernotrentingtoomany CHECK (NOT EXISTS (SELECT memberno FROM RentalAgreement GROUP BY memberno HAVING COUNT(*) > 10))

Step 4: Choose file organizations and indexes Determine optimal file organizations to store the base tables, and the indexes required to achieve acceptable performance. Consists of the following steps: Step 4.1: Analyze transactions. Step 4.2: Choose file organizations. Step 4.3: Choose indexes.

Step 4.1: Analyze transactions To understand functionality of the transactions and to analyze the important ones. Identify performance criteria, such as: transactions that run frequently and will have a significant impact on performance; transactions that are critical to the business; times during the day/week when there will be a high demand made on the database (called the peak load). Use this information to identify the parts of the database that may cause performance problems. To select appropriate file organizations and indexes, also need to know high-level functionality of the transactions, such as: columns that are updated in an update transaction; criteria used to restrict records that are retrieved in a query.

Step 4.1: Analyze transactions Often not possible to analyze all expected transactions, so investigate most important ones. To help identify which transactions to investigate, can use: transaction/table cross-reference matrix, showing tables that each transaction accesses, and/or transaction usage map, indicating which tables are potentially heavily used. To focus on areas that may be problematic: (1) Map all transaction paths to tables. (2) Determine which tables are most frequently accessed by transactions. (3) Analyze the data usage of selected transactions that involve these tables.

Example: Cross-referencing transactions and tables

Example: Transaction usage map for some sample transactions showing expected occurrences:

Step 4.1: Analyze transactions Data usage analysis For each transaction determine: (a) Tables and columns accessed and type of access. (b) Columns used in any search conditions. (c) For query, columns involved in joins. (d) Expected frequency of transaction. (e) Performance goals of transaction.

Example: Example Transaction Analysis Form:

Step 4.2: Choose file organizations To determine an efficient file organization for each base table. File organizations include: Heap, Hash, Indexed Sequential Access Method (ISAM), B+-Tree, and Clusters. Some DBMSs (particularly PC-based DBMS) have fixed file organization that you cannot alter.

Indexes motivating example Linear search: If records are not ordered, then to find a specific record we need to potentially search all the records. If the total number of records is n then this takes time proportional to n. If records are ordered (sorted) by the key value like alphabetically sorted words in a dictionary search can be much more efficient. Binary search: If records are sorted by the key value, and we search for a specific record with key K, we first look at the middle record with key, say, K. Compare K with K. If K = K then we are done. If not, then if K < K then we reduce search to only left half of the records; if K > K then we reduce search to only right half of the records. If the total number of records is n then binary search takes time proportional to log 2 (n) (logarithm base 2 of n). Suppose that the total number of records is n = 1024. If one camparison operation, e.g., operation like K < K, K = K, or K > K, takes time, say, 0.1 second, then linear search takes time roughly n/10 seconds = 102.4 seconds (almost 2 minutes), but the binary search takes time roughly log 2 (n)/10 seconds = log 2 (1024)/10 seconds = 1 second!!! To compute log 2 (1024), we just count how many times 1024 = 2 10 should be divided by 2 until we get 1. Thus log 2 (1024) = 10.

Step 4.3: Choose indexes Determine whether adding indexes will improve the performance of the system. One approach is to keep records unordered and create as many secondary indexes as necessary. Or could order records in table by specifying a primary or clustering index. In this case, choose the column for ordering or clustering the records as: column that is used most often for join operations - this makes join operation more efficient, or column that is used most often to access the records in a table in order of that column.

Step 4.3: Choose indexes If ordering column chosen is key of table, index will be a primary index; otherwise, index will be a clustering index. Each table can only have either a primary index or a clustering index. Secondary indexes provide additional keys for a base table that can be used to retrieve data more efficiently. Have to balance overhead in maintenance and use of secondary indexes against performance improvement gained when retrieving data. This includes: adding an index record to every secondary index whenever record is inserted; updating a secondary index when corresponding record is updated; increase in disk space needed to store the secondary index; possible performance degradation during query optimization to consider all secondary indexes.

Step 4.3: Choose indexes Guidelines for choosing wish-list (1) Do not index small tables. (2) Index PK of a table if it is not a key of the file organization. (3) Add secondary index to any column that is heavily used as a secondary key. (4) Add secondary index to a FK if it is frequently accessed. (5) Add secondary index on columns that are involved in: selection or join criteria; ORDER BY; GROUP BY; and other operations involving sorting (such as UNION or DISTINCT). (6) Add secondary index on columns involved in built-in functions. (7) Add secondary index on columns that could result in an index-only plan. (8) Avoid indexing an column or table that is frequently updated. (9) Avoid indexing an column if the query will retrieve a significant proportion of the records in the table. (10) Avoid indexing columns that consist of long character strings.

Creating indexes in SQL To create a primary index on the Video table based on column catalogno: CREATE UNIQUE INDEX catalognoprimaryindex ON Video(catalogNo); To create a secondary index on the Video table based on column title: CREATE INDEX catalognosecondaryindex ON Video(title); To create a clustering index on the VideoForRent table based on column catalogno: CREATE INDEX catalognoclusteringindex ON VideoForRent(catalogNo) CLUSTER; To drop (delete) an index: DROP INDEX catalognoprimaryindex;