Relational Database Systems 2 3. Indexing and Access Paths

Size: px

Start display at page:

Download "Relational Database Systems 2 3. Indexing and Access Paths"

Daniel May
6 years ago
Views:

1 Relational Database Systems 2 3. Indexing and Access Paths Silke Eckstein Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

2 Storage Types Explain the main characteristics of offline, nearline and online storage and give an example XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 2

3 Storage Types Why is data not stored in primary storages by default? Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 3

4 Storage Types Why is sequential access so much faster on hard drives than random access? * ~ 382 / 273 Ran R/W (MB/s) ~ 1.7 / 1.5 * IOPS: random reads / writes of 4KB Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 4

5 Reliability What are common measures to express the reliability of a hard drive? Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 5

6 Reliablity How and why does the reliablity differ between hard drives used for desktops and servers? Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 6

7 Reliablity Briefly explain the bathtub curve Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 7

8 RAID 4 hard disks 500 GB capacity 80 MB/sec transfer rate For RAID 0, 1, 5, 6 and 1+0 What is the... maximum capacity? maximum transfer rate for reading a single block? maximum transfer rate for writing two blocks concurrently? Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 8

9 RAID Assuming a RAID 5 and a RAID 6 array with the same idealized MTBF. Why is a RAID 6 still considered more secure than the more efficient RAID 5? Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 9

10 RAID RAID 5 (4+1) D 1 = D 2 = D 3 = D 4 = P = Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 10

11 NAS Server Network storage Contrast the characteristics of a SAN to those of a NAS Application Software Application Software File System Network SAN File System Logical Storage Logical Storage Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 11

12 2 Storage Each DBMS uses several types of storage Storage hierarchy Predominant storage type is the hard disk Hard disks Slow random access Fast sequential transfer Unreliable & failure prone Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 12

13 3 Indexing and Access Paths 3.1 Introduction to access paths 3.2 Files and blocks 3.3 Indexing Single-level indexes Multi-level indexes Hash indexes 3.4 Physical Tuning Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 13

3.1 Introduction to Access Paths Databases persistently store data Major problem: efficiency of data access Obviously depends on the hardware used But also depends to a large degree On the allocation

14 3.1 Introduction to Access Paths Databases persistently store data Major problem: efficiency of data access Obviously depends on the hardware used But also depends to a large degree On the allocation of data on the disks On intelligent buffer management On creating access paths and indexing data Physical tuning of the database is a main task of database administrators Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 14

3.1 Introduction to Access Paths For persistent storage databases are mapped into a number of files Located in specially protected parts of the file system (tablespaces, etc.

15 3.1 Introduction to Access Paths For persistent storage databases are mapped into a number of files Located in specially protected parts of the file system (tablespaces, etc.) Actually maintained by the operating system Each file is partitioned into fixed length blocks (pages) Smallest unit transferred from/to storage Block size is specified on DB creation Is a multiple of the OS block size A block usually contains several data records SKS 10.5 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 15

3.1 Sectors and Blocks Harddisk Sectors are

abstracts FS blocks to DBMS blocks DISK FS

FS Block 3 FS Block 4 FS Block 5 FS Block 6

16 3.1 Sectors and Blocks Harddisk Sectors are abstracted by the file system to blocks DBMS abstracts FS blocks to DBMS blocks DISK FS Block 1 DBMS Block 1 FS Block 2 DBMS Block 2 FS Block 3 FS Block 4 FS Block 5 FS Block 6 DBMS Block 3 DBMS Block 4 DBMS Block 5 DBMS Block 6 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 16

17 3.1 Buffer Management For data access always complete blocks (pages) are transferred from disk into the main memory The part used for holding copies of blocks is referred to as DB buffer and managed by the buffer manager Optimized (pre-)fetching of blocks greatly improves performance If a data record is requested by the DB If the block is buffered, return main memory address Else allocate space in the buffer, fetch the block from disk, and return main memory address SKS 10.5 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 17

3.1 Buffer Management Once new blocks are fetched, generally currently buffered blocks have to be evicted If blocks have been modified, they must be written back to disk (which is not always

18 3.1 Buffer Management Once new blocks are fetched, generally currently buffered blocks have to be evicted If blocks have been modified, they must be written back to disk (which is not always possible) Writing of blocks depends on the recovery strategy: Blocks that cannot yet be written are called pinned blocks Before checkpoints there might be a forced output of blocks Several buffer replacement strategies can be applied SKS 10.5 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 18

3.1 Replacement Strategies Least recently used (LRU): the oldest block is replaced Usually used in OS buffer management Simple, yet effective strategy Does not use semantics of the data Toss

19 3.1 Replacement Strategies Least recently used (LRU): the oldest block is replaced Usually used in OS buffer management Simple, yet effective strategy Does not use semantics of the data Toss immediate: After the DB has finished to process a block, the block is immediately replaced Example: block-nested loop join between tables R, S Take one block from R and compare against all blocks in S The block from R can be thrown out, but no block from S SKS 10.5 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 19

20 3.1 Replacement Strategies Expected Reuse: statistics about query frequencies assess how useful a buffered block is The block with least expected usefulness is evicted Example: index blocks are more often addressed than data blocks In any case, whatever the strategy: Once a block has been modified and has not yet been written to disk, it cannot be evicted! Note: recovery subsystem has to agree before writing a block to disk SKS 10.5 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 20

21 3.2 Blocks (Pages) Overhead & Payload (e.g., ORACLE data blocks) Header contains general block information like block address, type of segment (data, index, ) Table directory contains tables that have rows in the block Row directory contains information about the actual rows in the block (IDs, addresses, ) Row data is actual data Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 21

22 3.2 Extents and Segments An Extent is a logical collection of blocks (usually adjacent) Fixed size, once more data space is needed for rows a new extent is allocated Remember: reading adjacent blocks improves access time Segments are collections of logically connected extents For instance tables, index segments, or rollback segments Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 22

23 3.2 Tablespaces A tablespace is the logical storage space needed for the data in a table A grouping of multiple files allocated on disk Example: Data1.ora, system.ora, test.dbf Good practice to have one tablespace for tables, and a different one for indexes CREATE TABLESPACE user_data DATAFILE udata.ora SIZE 10M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 23

24 3.2 File Organization Records of different relations should be stored in individual files Storing related records in the same block minimizes disk accesses Multitable clustering file organization may store records of different tables in the same block Good e.g. for some often occurring joins Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 24

25 3.2 File Organization Files reserve space in the file system File size can be specified or change dynamically Default values strongly differ (e.g., DB2 allocates 100 MB) Files related to table content_type_lecture Schema file Data file with data blocks Index file with index blocks Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 25

26 3.2 File Organization Organization of records in the file Heap a record can be placed anywhere in the file where there is space Sequential store records in sequential order, based on the value of the search key of each record Hashing a hash function is computed on some attribute of each record. The result specifies in which block of the file the record should be placed Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 26

27 3.2 File Organization Data records have to be written in a file such that the entire record can be accessed with minimal disk accesses Fixed length records (easy to implement) Variable length records (storage space efficient) TYPE deposit = RECORD END name : CHAR(22); accno : CHAR(10); balance : REAL; If each char is 1 byte and a real 8 bytes, a record takes 40 bytes record Name AccNo Balance 0 Burg ,55 1 Myers ,45 2 Smith ,75 SKS 10.6 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 27

28 3.2 File Organization Fixed length records 40 bytes for the first record, 40 bytes for the second, Problem: if the block size is not a multiple of 40, records may cross boundaries Problem: deleting a record can be either done by marking it as deleted, or by replacing it with some other record of the file Keeping deleted items? Reading slow! Move up all other items? Efficiency! Fill space with next inserted item? Changes sequence! Pointer lists? Wastes space in each record! record Name AccNo Balance 0 Burg ,55 1 Myers ,45 2 Smith ,75 SKS 10.6 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 28

3.2 File Organization Variable length records Necessary for multi-table files or records that allow variable length attributes Problem: How to know when a record ends?

29 3.2 File Organization Variable length records Necessary for multi-table files or records that allow variable length attributes Problem: How to know when a record ends? Introduce end-of-record symbols or store record length at beginning of the record. But what if a record is updated? Slotted page structure: header contains number of entries, end of free space and pointers to location/size of entries. Records are moved to use up space: no fragmentation! SKS 10.6 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 29

3.2 File Organization Typical database records like in our banking example easily fit into one block Name, account number, balance, Usually multiple records per block Unspanned record organization

30 3.2 File Organization Typical database records like in our banking example easily fit into one block Name, account number, balance, Usually multiple records per block Unspanned record organization fills blocks only with complete records Spanned organization uses pointers to divide records i i+1 i record 1 record 2 record 3 record 4 record 5 record 1 record 2 record 3 4 p i+1 4 record 5 EN 4.4 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 30

sequence of bytes If an object is bigger than a block, contiguous pages of the buffer pool have to be allocated for storage

31 3.2 Non-Standard Blocks Necessary for large objects: binary large objects (blobs) and character large objects (clobs) Large objects are not interpreted in databases Text documents, images, audio and video data Need to be stored in a contiguous sequence of bytes If an object is bigger than a block, contiguous pages of the buffer pool have to be allocated for storage Sometimes preferable to disallow direct access, but only allow access through file-system-like API to allow for fragmentation 31

3.3 Indexing Indexes help to locate records in a DB file Creation of indexes is part of the physical tuning task of database administrators Indexes often influence the actual location of storage for

32 3.3 Indexing Indexes help to locate records in a DB file Creation of indexes is part of the physical tuning task of database administrators Indexes often influence the actual location of storage for a record Example: sequential storage, storage via a hash function If the location is determined by the index Not all attributes can be directly indexed (but secondary access paths may be used) Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 32

3.3 Basic Concepts When items have to be

33 3.3 Basic Concepts When items have to be found quickly, indexing mechanisms are used to speed up access Alphabetical author catalog in libraries Lexicographic ordering Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 33

34 3.3 Basic Concepts Ordered by indexing field (search key) Attribute(s) used to look up records in a file An index file consists of index entries Records of the form Two basic kinds of queries Exact matches Locating a record(s) with a certain value Range queries Locating all records in a certain value range search key record location(s) Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 34

35 3.3 Basic Concepts Two basic kinds of indexes Ordered indexes: search keys are stored in sorted order Single-level ordered indexes Multi-level ordered indexes Hash indexes: search keys are distributed uniformly across blocks using some hash function Hash function distributes records uniformly over blocks Hashed value of search key decides for storage block Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 35

36 3.3 Single-Level Ordered Indexes Index links indexing fields to logical file/block locations Dense indexes index all items, sparse indexes only selected ones Much smaller file than the data file Hence, a binary search on the index file requires fewer block accesses than a binary search on the data file Works exactly like a book s index Only few pages at beginning/end, alphabetically ordered, references to respective page(s) Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 36

37 3.3 Single-Level Ordered Indexes Primary indexes Order data by some usually unique attribute as indexing field (primary key), store database records in this order Index record contains pointer to the respective storage place (block address) AccNo 1 1, Adams, $ 887,00 To save entries usually there is only a single index entry for each block (block anchor) AccNo 4 AccNo 7 But sparse indexes do not directly show, whether some record is in the DB 2, Bertram, $19,99 3, Behaim, $ 167,00 4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00 EN 14.1 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 37

38 3.3 Single-Level Ordered Indexes Advantages Number of blocks needed for storing the index is small compared to data Not all records are indexed (non-dense) Index entries are smaller than data records Can often be kept in buffer Disadvantages Insertions and Deletions need to move data in storage and to update index entries affected by the shifts Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 38

39 3.3 Single-Level Ordered Indexes Clustering indexes store data records in the order of a non-unique indexing field One entry in the index per distinct value of the indexing field Search keys are linked to the block address of the containing the search key Is a sparse index But existence of records with a certain key can be assessed by index look-up Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 39

40 3.3 Single-Level Ordered Indexes Still problems with insertion/deletion Often one or more complete blocks are reserved for each single search key to allow for block anchors Blocks do not have to be adjacent, but can use block pointers, if more space is needed Structurally very similar to a hash index Adams 1, Adams, $ 887,00 Miller Tarrans 2, Adams, $19,99 4, Miller, $ 1866,00 5, Miller, $179,99 6, Miller, $ 682,56 7, Miller, $ 8642,78 8, Tarrens, $ 856,78 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 40

41 3.3 Single-Level Ordered Indexes Secondary indexes point to locations of records regarding a non-ordering attribute Indexing does not affect the storage order There can be multiple secondary indexes for the same DB file Secondary indexes are usually dense Objects with same or adjacent values are usually not adjacent on disk If the indexing field has unique values (secondary key) definitely all records have to be indexed Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 41

42 3.3 Single-Level Ordered Indexes If the indexing field is not unique there are several possibilities to create a secondary index Create a dense index by including duplicate search keys (one for each record) Use variable-length index entries, where each search key is assigned a list of pointers Keep fixed-length index entries, but point to a block containing (multiple) pointers to the actual records Introduces a level of indirection, but allows for a sparse index Usually used in practice Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 42

43 3.3 Single-Level Ordered Indexes Dense secondary index or sparse with indirection Adams Cesar Miller Miller Orwell 1, Cesar, $ 887,00 2, Tarrens, $19,99 3, Miller, $19,99 4, Orwell, $ 1866,00 5, Miller, $179,99 Adams Cesar Miller Orwell p p p p p 1, Cesar, $ 887,00 2, Tarrens, $19,99 3, Miller, $19,99 4, Orwell, $ 1866,00 5, Miller, $ 179,99 Post Snyder Tarrens Tarrens 6, Tarrens, $ 682,56 7, Post, $ 8642,78 8, Adams, $ 856,78 9, Snyder, $ 856,78 Post Snyder Tarrens p p p p 6, Tarrens, $ 682,56 7, Post, $ 8642,78 8, Adams, $ 856,78 9, Snyder, $ 856,78 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 43

44 3.3 Single-Level Ordered Indexes Characteristics of secondary indexes Speeds up retrieval, because if it does not exist, the entire file would have to be scanned linearly For non-existent primary indexes files can still be scanned in a binary search fashion Uses more search time and space, because it is dense Secondary indexes provides a logical ordering Accessing records in that order might not be the most efficient way regarding block accesses Each record access may fetch a new block into the buffer Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 44

45 3.3 Single-Level Ordered Indexes Improving secondary indexes for range queries Same block may be (unnecesarrily) accessed multiple times Simple Example: Cesar-Orwell (buffer holds only 1 block) Cesar: fetch first block Miller: evict first block, fetch second block Miller: evict second block, fetch first block Orwell: evict first block, fetch second block Adams Cesar Miller Miller Orwell Post Snyder Tarrens Tarrens 1, Cesar, $ 887,00 2, Tarrens, $19,99 3, Miller, $19,99 4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56 7, Post, $ 8642,78 8, Adams, $ 856,78 9, Snyder, $ 856,78 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 45

46 3.3 Single-Level Ordered Indexes Improving secondary indexes for range queries Remember: working on blocks in main memory is cheap, fetching blocks from disk is expensive! Improved Example: Cesar-Orwell Fetch Cesar and scan whole block Output 1 & 3 Mark block as completed and evict Fetch Miller and scan whole block Output 4 & 5 Mark block as completed and evict Skip second Miller Skip Orwell Adams Cesar Miller Miller Orwell Post Snyder Tarrens Tarrens 1, Cesar, $ 887,00 2, Tarrens, $19,99 3, Miller, $19,99 4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56 7, Post, $ 8642,78 8, Adams, $ 856,78 9, Snyder, $ 856,78 Done Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 46

47 3.3 Single-Level Ordered Indexes Short Summary Primary and cluster indexes affect storage order, secondary indexes don t Primary and clustering indexes may be sparse, secondary indexes are usually dense Primary and clustering indexes can use block anchors Index sizes Primary index number of blocks Clustering index number of distinct search keys Secondary index up to number of records in table Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 47

48 3.3 Multi-Level Ordered Indexes Example: What if a primary index does not fit in main memory? Bad look-up efficiency! Simple multi-level solution Treat primary index on disk as a sequential file and construct a sparse index on it outer index sparse index of primary index inner index primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, and so on Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 48

49 3.3 Multi-Level Ordered Indexes Multi-level indexes can further improve search speed In single level indexes a binary search can be applied to locate pointers to blocks Efficiency of log 2 N Multi-level indexes allow for higher search efficiency Fan-out of the index should be higher than 2 Efficiency of log fan-out N EN 14.2 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 49

50 3.3 Multi-Level Ordered Indexes Concept of Multi-Level Indexes can be applied to primary, clustering and secondary indexes as long as values are distinct Example: 2-Level Primary Index Level 1 is Primary Index Level 2 is Primary Index of level 1 Efficiency of log fan-out N fan-out: Number of 1 st level blocks per 2 nd level block , Adams, $ 887,00 2, Bertram, $19,99 3, Behaim, $ 167,00 4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 Level , Tarrens, $ 99,00 10, Arnold, $ 3442,76 EN 14.2 Level 1 11, Marks, $435,19 12, Black, $ 319,44 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 50

51 3.3 Multi-Level Ordered Indexes There is a huge amount of different tree structures for multi-level indexing Classical structures B-tree, B*-tree, etc. Multidimensional structures R-trees, Quad-trees, etc. Lots of literature Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 51

52 3.3 Hash Indexes Index Structure based on Hashing Idealized Access Speed: O(1) Basic Idea: Fixed-Size directly addressable Index Space [0..M] containing M+1 buckets Buckets contain links to data Single link in internal hashing (i.e. in memory) Multiple links for external hashing (i.e. on harddisk) Hash Function h: Z [0..M] Maps any number to [0..M] Should be uniform (i.e.: same probability for any x [0..M]) Especially, should also be surjective (i.e. use the full range of [0..M]) Needs to be deterministic (i.e.: same input same output) Should be simple and fast EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 52

53 3.3 Hash Indexes Basic Idea Convert value which is to be indexed to numeric representation Hash the value Store block anchor of value in the bucket with computed hash index Example: M=8 Miller Cesar Snyder h(miller) = 4 h(cesar) = 7 H(Snyder) = , Snyder, $ , Miller, $ , Cesar, $ EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 53

54 3.3 Hash Indexes Problem: Collision Hash functions are not injective collision may happen Block pointers for full blocks (overflow list) Probability of collision increases with load factor Deteriorates to linear behavior in worst case Rogers h(rogers) = 1 Miller h(miller) = 4 Cesar h(cesar) = 1 Snyder h(snyder) = 1 Smith h(smith) = , Rogers, $ , Cesar, $ , Snyder, $ , Smith, $ , Miller, $ EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 54

55 3.3 Hash Indexes Problem: Static Address Range Addressable range is fixed to [0..M] What happens if address range is exhausted? E.g., load factor too high, too many collisions occur Possible solutions Rehashing : Create a new larger hashmap and rehash all values For example, java hashmap follow that policy Extendible Hashing: Uses dynamic directory to double and half number of buckets Linear Hashing: Buckets are split into two one after another and are individually rehashed with an additional hash-function Not in this lecture EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 55

56 3.3 Hash Indexes Extendible Hashing Hash values are positive integers between [0..2 n ] Numbers can be represented in binary Uses a so-called trie structure Choose large n Create a directory of depth d with 2 d entries for the first d bits of values d is global depth of directory Directory cells link to buckets containing links to data with hash value starting with given bit pattern Adjacent cells may link to the same bucket depending on local depth of bucket EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 56

57 3.3 Hash Indexes Extendible Hashing (Bucket size 3) Global Depth = 3 0_0010 0_0100 0_ _000 10_101 10_ _ _ _ _ _ _111 Local depth = 1 All hash values starting with 0 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 57

58 3.3 Hash Indexes Insert Split this bucket, add Global Depth = 3 0_0010 0_0100 0_ _000 10_101 10_ _ _ _ _ _ _111 Local depth = 1 All hash values starting with 0 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 58

3.3 Hash Indexes 00_010 00_100 Local depth = 2 All hash values starting with 00 000 001 010 011 100 101 110 111 Global Depth = 3 01_000 01_010 10_000 10_101 10_110 110_000 110_010 110_100 111_001

59 3.3 Hash Indexes 00_010 00_100 Local depth = 2 All hash values starting with Global Depth = 3 01_000 01_010 10_000 10_101 10_ _ _ _ _ _ _111 Local depth = 2 All hash values starting with 01 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 59

60 3.3 Hash Indexes Extendible Hashing Allows to dynamically split and coalesce buckets as database grows and shrinks If bucket is full and local depth is lower than global depth: More than one cell links to this bucket Bucket is split into two with increased local depth and values are distributed accordingly Links in cells are adjusted accordingly If local depth already equals global depth: Global depth is increased by one and thus the size of the directory is doubled Each cell is replaced by two cells, both of which contain the same pointer as the original cell EN Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 60

61 3.3 Hash Indexes Summary Hash Indexes: May perform in O(1) But: Management overhead can be quite large for growing data collections Overflow lists have small management overhead, but may decrease access performance to O(n) Rehashing is very expensive for each growth stage Extendible hashing has a smaller overhead and is especially suitable for external storage hashing EN 13.8 Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 61

62 3.3 Indexing Why not simply index every attribute? Physical index on primary key, logical indexes on every other attribute Results in good read efficiency But whenever a DB file is modified, every index on the file has to be updated Updating indices imposes overhead on database modification (insert, delete, update) Especially for multi-level indexes all levels may have to be updated Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 62

3.4 Physical Tuning One of the tasks of the database administrator Black Magic Get DB performance statistics Try something that seems sensible Get new performance statistics If

63 3.4 Physical Tuning One of the tasks of the database administrator Black Magic Get DB performance statistics Try something that seems sensible Get new performance statistics If it did not improve, reset it to previous configuration Try something different until satisfied... Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 63

3.4 Physical Tuning Black Magic today needs advanced

Snapshot Monitors Continuously collect cumulative

Monitors Hook into certain events and analyze them

64 3.4 Physical Tuning Black Magic today needs advanced tools to operate correctly For database tuning, you need Snapshot Monitors Continuously collect cumulative statistics for the DB within a given time span Event Monitors Hook into certain events and analyze them Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 64

65 3.4 Diagnostic Database Tools Snapshot Monitors Monitor collects cumulative statistics Internal counters set at global or session levels Collected statistics can be evaluated afterwards for system tuning on general level Data collection is explicitly enabled or disabled Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 65

66 3.4 Diagnostic Database Tools Useful for collecting point-in-time information regarding overall DB/DBMS behavior Locking lists all locks & types held by all applications Bufferpools cumulative stats for memory, physical & logical I/O s, synch/asynch Sorts provides detailed statistics regard sort-heap usage, overflows, # active etc. Tablespaces detailed I/O and activity statistics for a given tablespace UOW displays status of application Unit of Work at point-in-time Dynamic SQL shows contents of package cache and related stats at point-in-time Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 66

67 3.4 Diagnostic Database Tools Snapshot Monitors Overhead continuously is in the 3% - 5% range But no I/O since counters are usually RAM resident, not written to disk Quick list of useful metrics to identify particular problems include: bufferpool hit ratios sort overflows/problems effectiveness of prefetch need for indexing transaction logging issues single query/application resource domination, etc. Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 67

68 3.4 Diagnostic Database Tools Snapshot Monitor: IBM DB2 Performance Expert Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 68

69 3.4 Diagnostic Database Tools Event Monitors Collects information regarding events or chains of events No continuously collected data as with snapshots Must be explicitly created and activated via DBMS commands or API s Are the best way to effectively diagnose and resolve transaction problems (e.g., deadlock issues) Output may be directed to a database table and results then analyzed using SQL Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 69

70 3.4 Diagnostic Database Tools Event Monitor can be used to Identify and rank most frequently executed or most time consuming SQL Track the sequence of statements leading up to a lock timeout or deadlock Track connections to the database Breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 70

71 3.4 Diagnostic Database Tools Event Monitor: DBI Brother Panther Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 71

again If results better : Great! Continue tuning! If result worse : Bad! Undo everything you did and try something else.

72 3.4 The Black Art of Physical Tuning Physical DB Tuning is a complicated and intransparent task Usually trial and error Measure some hopefully meaningful metrics of you DB usage statistics Adjust around on something Mostly indexes and tablespace properties Measure again If results better : Great! Continue tuning! If result worse : Bad! Undo everything you did and try something else. But what to measure and what to do? Use best practices or try yourself! Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 72

73 3.4 Physical Tuning What to measure? First, what kind of database you have? OLTP (Online Transaction Processing) : Database with many small, concurrent read / write queries Optimize for fast read write accesses Keep an eye on real-time response times Optimize for concurrency Warehouse (OLAP Online Analytical Processing): Using larger and more complex queries for primary retrieving data Optimize for read accesses Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 73

3.4 Tuning Metrics There is a huge amount of knobs and screws to turn DB planning and tuning is for professionals Special certificates by each database vendor IBM DB2 IBM Certified Database Associate

74 3.4 Tuning Metrics There is a huge amount of knobs and screws to turn DB planning and tuning is for professionals Special certificates by each database vendor IBM DB2 IBM Certified Database Associate IBM Certified Database Administrator IBM Certified Application Developer DB2 IBM Certified Advanced Database Administrator Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 74

3.4 Tuning Metrics Scott Hayes President & CEO of Database-Brothers Inc (DBI) If I only had five minutes to assess the status of a database, I would look at Average Result Set Size Index

75 3.4 Tuning Metrics Scott Hayes President & CEO of Database-Brothers Inc (DBI) If I only had five minutes to assess the status of a database, I would look at Average Result Set Size Index Read Efficiency Synchronous Read Percentage Average Rows Read per DB Transaction per Table Average Read Time Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 75

76 3.4 Tuning Metrics ARSS - Average Result Set Size ARSS = RowsSelected / NumSelectQueries Rule of Thumb: ARSS 10 indicates a OLTP database If you think you have OLTP and ARSS>>10, there is something wrong ARSS > 10 indicates a warehouse database (OLAP) Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 76

3.4 Tuning Metrics IREF - Index Read Efficiency How many rows have to be read to retrieve one row? i.e. How much overhead is within the where predicates?

77 3.4 Tuning Metrics IREF - Index Read Efficiency How many rows have to be read to retrieve one row? i.e. How much overhead is within the where predicates? IREF = RowsRead / RowsReturned Well-designed indexes may evaluate a where-clause without reading additional rows! Rule of Thumb: IREF should be less than 10 for OLTP Databases May be higher for Warehouses (>100) What to do? Introduce more efficient indexes! User simpler queries! Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 77

3.4 Tuning Metrics SRP Synchronous Read Percentage Synchronous Read means the DBMS reads just the index file and the required data page (good) Asynchronous Read means the DBMS employs prefetching to

78 3.4 Tuning Metrics SRP Synchronous Read Percentage Synchronous Read means the DBMS reads just the index file and the required data page (good) Asynchronous Read means the DBMS employs prefetching to scan for an index or data (BAD) Rule of Thumb: >90% for OLTP (>50% for warehouses) : Good 80%-90% (25%-50%) : Ok 50%-80% (<25%) Bad What to do? Improve index structures! Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 78

10-100 : Ok >100 : Bad >1000 : Horrific What to do? Improve index structures!

79 3.4 Tuning Metrics TBRRTX Average Rows Read per DB Transaction per Table TBRRTX table = rowsread / (attemptedcommits + attemptedrollbacks) Rule of Thumb: <10 for OLTP : Good : Ok >100 : Bad >1000 : Horrific What to do? Improve index structures! Improve Queries Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 79

80 3.4 Tuning Metrics ART Average Read Time The average time per read RT = ReadTime / (NumDataRead + NumIndexRead) If you think you did everything within your application and indices, optimize read time Adjust size of tablespace containers Distribute containers across disks Avoid OS paging device Move highly access tablespaces to faster devices Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 80

3 Indexes Buffer Management is very important Holds DB blocks in primary memory DB block are made up of several FS blocks Find good strategies to have requested DB blocks available when needed Each

81 3 Indexes Buffer Management is very important Holds DB blocks in primary memory DB block are made up of several FS blocks Find good strategies to have requested DB blocks available when needed Each block holds some meta data and row data Indexes drastically speed up queries Less blocks need to be scanned Primary Index On primary key attribute, usually influences row storage order Secondary Index On any attribute, does not influence storage order Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 81

82 Next Lecture Tree-Index Structures Binary Tree Naïve Binary Tree Balancing Tree B-Trees B B+ B* Relational Database Systems 2 Wolf-Tilo Balke Institut für Informationssysteme 82

Relational Database Systems 2 3. Indexing and Access Paths

Relational Database Systems 2 3. Indexing and Access Paths Wolf-Tilo Balke Jan-Christoph Kalo Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 3 Indexing