Step 4: Choose file organizations and indexes

Similar documents
Overview of Storage and Indexing

Overview of Storage and Indexing

Overview of Storage and Indexing

Overview of Storage and Indexing. Data on External Storage

Overview of Storage and Indexing

Review of Storage and Indexing

The use of indexes. Iztok Savnik, FAMNIT. IDB, Indexes

Storage and Indexing

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

COMP102: Introduction to Databases, 14

Single Record and Range Search

Overview of Indexing. Chapter 8 Part II. A glimpse at indices and workloads

Overview of Storage and Indexing

Modern Database Systems Lecture 1

CompSci 516: Database Systems

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CS317 File and Database Systems

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

Step 2: Map ER Model to Tables

Lecture 34 11/30/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

CS 443 Database Management Systems. Professor: Sina Meraji

Database Normalization

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

OBJECTIVES. How to derive a set of relations from a conceptual data model. How to validate these relations using the technique of normalization.

CSIT5300: Advanced Database Systems

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Physical Database Design and Tuning. Chapter 20

CS317 File and Database Systems

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Data on External Storage

Introduction to Data Management. Lecture #13 (Indexing)

Physical Database Design and Tuning

ER Model. Objectives (2/2) Electricite Du Laos (EDL) Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 1

Lecture #16 (Physical DB Design)

Tree-Structured Indexes

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Overview of Storage and Indexing

Discuss physical db design and workload What choises we have for tuning a database How to tune queries and views

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

Lecture 8 Index (B+-Tree and Hash)

Step 1: Create and Check ER Model

Kathleen Durant PhD Northeastern University CS Indexes

Chapter 1: overview of Storage & Indexing, Disks & Files:

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Chapter 5: Physical Database Design. Designing Physical Files

Friday Nights with Databases!

Physical Database Design

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Lecture 36 12/4/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Overview. Understanding the Workload. Physical Database Design And Database Tuning. Chapter 20

Indexing: Overview & Hashing. CS 377: Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Evaluation of Relational Operations: Other Techniques

Hash-Based Indexing 165

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Unit 3 Disk Scheduling, Records, Files, Metadata

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

File Structures and Indexing

Tree-Structured Indexes

RAID in Practice, Overview of Indexing

Evaluation of Relational Operations: Other Techniques

Introduction to Data Management. Lecture 21 (Indexing, cont.)

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

CSE 544 Principles of Database Management Systems

Tree-Structured Indexes

CSE 444: Database Internals. Lectures 5-6 Indexing

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 12: Indexing and Hashing. Basic Concepts

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Chapter 12: Indexing and Hashing

Evaluation of relational operations

Lecture 31 11/16/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Introduction to Data Management. Lecture 15 (More About Indexing)

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

Records in a file are grouped into buckets. Search key values are organized in a tree. The highest level is the root

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Midterm Review. March 27, 2017

CSIT5300: Advanced Database Systems

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Database Design and Tuning

Chapter 12: Indexing and Hashing (Cnt(

Overview of Query Evaluation. Chapter 12

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Query Processing: The Basics. External Sorting

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS542. Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner. Worcester Polytechnic Institute

Tree-Structured Indexes

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

ECS 165B: Database System Implementa6on Lecture 7

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

L9: Storage Manager Physical Data Organization

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Transcription:

Step 4: Choose file organizations and indexes Asst. Prof. Dr. Kanda Saikaew (krunapon@kku.ac.th) Dept of Computer Engineering Khon Kaen University Overview How to analyze users transactions to determine characteristics that may impact performance. How to select appropriate file organizations based on analysis of transactions. When to select indexes to improve performance 2 Step 4 Choose file organizations and indexes Determine optimal file organizations to store the base tables, and the indexes required to achieve acceptable performance. Consists of the following steps: Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes 3 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 1

Step 4.1 Analyze transactions To understand functionality of the transactions and to analyze the important ones Identify performance criteria, such as: transactions that run frequently and will have a significant impact on performance transactions that are critical to the business times during the day/week when there will be a high demand made on the database (called the peak load) 4 Step 4.1 Analyze transactions Use this information to identify the parts of the database that may cause performance problems. To select appropriate file organizations and indexes, also need to know highlevel functionality of the transactions, such as: columns that are updated in an update transaction; criteria used to restrict records that are retrieved in a query. 5 Step 4.1 Analyze transactions Often not possible to analyze all expected transactions, so investigate most important ones. To help identify which transactions to investigate, can use: transaction/table cross-reference matrix, showing tables that each transaction accesses, and/or transaction usage map, indicating which tables are potentially heavily used. 6 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 2

Step 4.1 Analyze transactions To focus on areas that may be problematic: (1) Map all transaction paths to tables (2) Determine which tables are most frequently accessed by transactions (3) Analyze the data usage of selected transactions that involve these tables 7 Cross-referencing transactions and tables Pearson Education Limited, 2004 8 Transaction usage map for some sample transactions showing expected occurrences 9 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 3

Step 4.1 Analyze transactions Data usage analysis For each transaction determine: (a) Tables and columns accessed and type of access. (b) Columns used in any search conditions. (c) For query, columns involved in joins. (d) Expected frequency of transaction. (e) Performance goals of transaction. Pearson Education Limited, 2004 10 Example Transaction Analysis Form Pearson Education Limited, 2004 11 Step 4.2 Choose file organizations To determine an efficient file organization for each base table File organizations include Heap, Hash, Indexed Sequential Access Method (ISAM), B+-Tree, and Clusters. Some DBMSs (particularly PC-based DBMS) have fixed file organization that you cannot alter 12 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 4

Data on External Storage Disks: Can retrieve random page at fixed cost But reading several consecutive pages is much cheaper than reading them in random order Tapes: Can only read pages in sequence Cheaper than disks; used for archival storage Data on External Storage File organization: Method of arranging a file of records on external storage. Record id (rid) is sufficient to physically locate record Indexes are data structures that allow us to find the record ids of records with given values in index search key fields Architecture: Buffer manager stages pages from external storage to main memory buffer pool. File and index layers make calls to the buffer manager. Alternative File Organizations Many alternatives exist, each ideal for some situations, and not so good in others: Heap (random order) files: Suitable when typical access is a file scan retrieving all records. Sorted Files: Best if records must be retrieved in some order, or only a `range of records is needed. Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 5

Alternative File Organizations Indexes: Data structures to organize records via trees or hashing. Like sorted files, they speed up searches for a subset of records, based on values in certain ( search key ) fields Updates are much faster than in sorted files. Indexes (1/2) An index on a file speeds up selections on the search key fields for the index Any subset of the fields of a relation can be the search key for an index on the relation Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation) Indexes (2/2) An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k. Given data entry k*, we can find record with key k in at most one disk I/O. (Details soon ) Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 6

B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches: index entry P 0 K 1 P 1 K 2 P 2 K m P m Example B+ Tree Root 17 Note how data entries in leaf level are sorted Entries <= 17 Entries > 17 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Find 28*? 29*? All > 15* and < 30* Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes. And change sometimes bubbles up the tree Hash-Based Indexes (1/2) Good for equality selections. Index is a collection of buckets. Bucket = primary page plus zero or more overflow pages. Buckets contain data entries. Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 7

Hash-Based Indexes (2/2) Hashing function h: h(r) = bucket in which (data entry for) record r belongs h looks at the search key fields of r. No need for index entries in this scheme. Index Classification Primary vs. secondary: If search key contains primary key, then called primary index. Unique index: Search key contains a candidate key. Clustered vs. unclustered: If order of data records is the same as, or `close to, order of data entries, then called clustered index. Clustered vs. Unclustered Index A file can be clustered on at most one search key Cost of retrieving data records through index varies greatly based on whether index is clustered or not! Secondary indexes provide additional keys for a base table that can be used to retrieve data more efficiently. If ordering column chosen is key of table, index will be a primary index; otherwise, index will be a clustering index. Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 8

Step 4.3 Choose indexes Determine whether adding indexes will improve the performance of the system Approach 1 Keep records unordered and create as many secondary indexes as necessary 25 Step 4.3 Choose indexes Have to balance overhead in maintenance and use of secondary indexes against performance improvement gained when retrieving data This includes: adding an index record to every secondary index whenever record is inserted updating a secondary index when corresponding record is updated increase in disk space needed to store the secondary index possible performance degradation during query optimization to consider all secondary indexes 26 Step 4.3 Choose indexes Approach 2 Order records in table by specifying a primary or clustering index. In this case, choose the column for ordering or clustering the records as: column that is used most often for join operations - this makes join operation more efficient, or column that is used most often to access the records in a table in order of that column 27 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 9

Step 4.3 Choose indexes Guidelines for choosing wish-list (1) Do not index small tables. (2) Index PK of a table if it is not a key of the file organization. (3) Add secondary index to any column that is heavily used as a secondary key. (4) Add secondary index to a FK if it is frequently accessed. (5) Add secondary index on columns that are involved in selection or join criteria; where clause; and sorting (such order by, group by, union, distinct) 28 Step 4.3 Choose indexes Guidelines for choosing wish-list (6) Add secondary index on columns involved in built-in functions. (7) Add secondary index on columns that could result in an index-only plan. (8) Avoid indexing an column or table that is frequently updated. (9) Avoid indexing an column if the query will retrieve a significant proportion of the records in the table. (10) Avoid indexing columns that consist of long character strings. 29 Index-Only Plans (1/3) A number of queries can be answered <E.dno> without retrieving any tuples from one <E.dno,E.sal> or more of the Tree index! relations involved if a <E. age,e.sal> suitable index or is available. <E.sal, E.age> Tree index! SELECT E.dno, COUNT(*) FROM Emp E GROUP BY E.dno SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY E.dno SELECT AVG(E.sal) FROM Emp E WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 10

Index-Only Plans (2/3) Index-only plans are possible if the key is <dno,age> or we have a tree index with key <age,dno> Which is better? What if we consider the second query? SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age=30 GROUP BY E.dno SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>30 GROUP BY E.dno Index-Only Plans (3/3) Index-only plans can also be found for queries involving more than one table; more on this later. <E.dno> SELECT D.mgr FROM Dept D, Emp E WHERE D.dno=E.dno <E.dno,E.eid> SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE D.dno=E.dno SQL commands related to index To create an index CREATE INDEX <indexname> ON <tablename> (<column>, <column>...); To enforce unique values, add the UNIQUE keyword: CREATE UNIQUE INDEX <indexname> ON <tablename> (<column>, <column>...); To specify sort order, add the keyword ASC or DESC after each column name To remove an index, simply enter: DROP INDEX <indexname>; 33 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 11

Summary Step 4: Choose file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Index PK and FK Index on columns that are involved in selection or join criteria; where clause; and sorting (such order by, group by, union, distinct) index on columns that could result in an index-only plan 34 References Connolly and Begg, Database Systems: A Practical Approach to Design, Implementation and Management, Pearson, 2004 Ramakrishnan and Gehrke, Database Management Systems, McGraw-Hill Science/Engineering/Math, 2003 35 Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 12