Advanced Database Systems

Size: px

Start display at page:

Download "Advanced Database Systems"

Morris Smith
6 years ago
Views:

1 Advanced Database Systems DBMS Architectures & Index Structure Hussam A. Halim Computer Science Department 2013

2 2

3 Outline Data organizations for secondary storage storage structures indexing structures Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees and B+-Trees Indexes on Multiple Keys 3

4 Components of a DBMS

5 Components of a DBMS A DBMS must provide data management functions that are efficient reliable concurrent secure (protected) Each of the above requirements is supported by specific components of the DBMS that together comprise the system architecture 5

6 Components of a DBMS Efficiency: File system: it manages the allocation of disk space and the data structures used to represent the information stored on secondary storage Buffer manager: it is responsible for transferring information between secondary storage and main memory Query parser: it translates the DDL and DML commands into an internal format (parse tree) Optimizer: it generates optimal execution plans for queries 6

7 Components of a DBMS Reliability Recovery manager: it assures that the DB is in a consistent state upon system failures and application program errors Concurrency Concurrency controller: it assures that interactions among concurrent application programs do not result in an inconsistent database state Integrity: Integrity manager: it assures that the integrity constraints are verified Security Authorization manager: it checks that accesses to the database are executed according to the authorizations 7

8 Components of a DBMS A DBMS uses several data structures that include: the files storing the user data (that is, the files storing the DB) the files storing the system data (that is, the files storing the DBMS catalogs) indexes (like B trees and hash tables) data statistics (for example the cardinality of each relation) that are used to determine the optimal query execution strategy 8

9 Efficiency we have seen high-level DBMS models, that is, we have talked about the logical level Such level is the correct level for the DB users (end-users and applications) an important factor is represented by the DBMS performance The DBMS performance depends from the data structure efficiency and on the efficiency of the DBMS when operating on them 9

10 Efficiency Various data structures can be used to implement a data model The choice of the structures to use depends on the type of accesses that are executed on the data A DBMS has its own strategies for implementing the data model the (expert) user may influence the choices made by the system (physical level) through the use of some specialized commands (like commands for the creation of indexes and clusters) 10

11 Storage devices The data managed by a DBMS must be physically stored on a storage device Main memory and other faster smaller memories Very fast access to data Small storage capacity Volatile the content is lost is the system crashes 11

12 Storage devices Secondary storage Magnetic disks, optical disks Very large capacities, low costs, slow access Need to transfer data in main memory in order to process them Non volatile Databases are in most cases stored on secondary storage (magnetic disks) Too large with respect to the storage capacities of main memory devices Increased guarantees for data persistence Very low costs 12

13 Disks The information is stored on the disk surface on concentric circles, each with a different diameter, called tracks; tracks are subdivided in sectors For disks with multiple platters, the set of tracks with the same diamater form a cylinder Data stored on the same cylinder can be retrieved much faster with respect to the data stored at a different cylinders The hardware mechanism for reading and writing is the head with is connected to a mechanical arm 13

14 Disks 14

15 Components of a Disk The platters spin. The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder. Only one head reads/writes at any one time. Block size is a multiple of sector size (which is fixed). Arm assembly Disk head Arm movement Spindle Tracks Sector Platters 15

16 Disks For multiple platter disks, there is a head for each platter. The read-write heads of all the tracks are mounted on a single assembly called a disk arm, and move together Multiple disk arms are moved as a unit by the actuator Each arm has two heads, to read disks above and below it 16

17 Performance measures of disks access time, data transfer rate, reliability Access time the time from when a read or write request is issued to when data transfer begins. To access data on a given sector of a disk, the arm first must move so that it is positioned over the correct track, and then must wait for the sector to appear under it as the disk rotates. The time for repositioning the arm is called seek time, and it increases with the distance the arm must move. Typical seek time range from 2 to 30 milliseconds. Average seek time is the average of the seek time, measured over a sequence of (uniformly distributed) random requests, and it is about one third of the worst-case seek time. Once the seek has occurred, the time spent waiting for the sector to be accesses to appear under the head is called rotational latency time. Average rotational latency time is about half of the time for a full rotation of the disk. (Typical rotational speeds of disks ranges from 60 to 120 rotations per second.) 17

18 Performance measures of disks access time, data transfer rate, reliability data transfer rate, the rate at which data can be retrieved from or stored to the disk. Current disk systems support transfer rate from 1 to 5 megabytes per second reliability, measured by the mean time to failure. The typical mean time to failure of disks today ranges from 30,000 to 800,000 hours (about 3.4 to 91 years ( 18

19 Page I/O Average rotational delay (rd) rd = ½ * (1/p) min = (60*1000)/(2*p) msec OR rd = ½ * cost of 1 revolution = ½ * (60*1000/p) msec where p is speed of disk rotation (how many revolutions per minute - rpm) Example Speed of disk rotatioon is p = 3600 rpm 60 revolutions/sec 1 rev. = (1000/60)=16.66 msec. (1 second = 1000 msec) rd =(1/2)*60*1000/3600= 8.33 ms 19

20 Page I/O Transfer rate (tr) tr = track size / cost of one revolution = track size / (60*1000/p) in msec Bulk transfer rate (btr) btr = (B/(B+G)) * tr bytes/msec Where B is the block size in bytes G is interblock gap size in bytes Block transfer time (btt) btt = B / tr not taking into acount G btt = B / btr taking into acount G 20

21 Page I/O Example: Track size = 50 KB and p = 3600 rpm Block size B = 3KB = 3000 bytes tr = (50*1000)/(60*1000/3600) = 3000 bytes/msec btt = B / tr = 3000/3000 = 1 msec 21

22 Page I/O Average time for reading/writing n consecutive pages that are in the same track or cylinder = s + rd + n * btt Average time for reading/writing consecutively n noncontigues pages/blocks that are in the same cylinder = s + n * (rd + btt) 22

23 An Example A hard disk specifications: 4 platters, 8 Surfaces, 3.5 Inch diameter 2 13 = 8192 tracks/surface 2 8 = 256 sectors/track 2 9 = 512 bytes/sector Average seek time s = 25 ms Rotation rate rd = 3600 rpm = 60 rps 1 rev. = msec Transfer rate tr = 1 KB in ms tr = 1 KB in ms with gap 23

24 An Example What is the total capacity of this disk 8 GB (8*2 13 *2 8 *2 9 =2 33 ) How many bytes does one track hold? 256 sectors/track*512 bytes/sector = 128KB How many blocks per track? one block = 4096 bytes = 8 sectors (4096/512) 256/8 = 32 blocks/track 24

25 An Example How long does it take to access one block? One block = 4096 bytes 8 sectors = 4096/512 Rotation rate r 1 rev. = msec. Time to access 1 sector (s + r/2 + tr/(secters/kb) 25 + (16.66/2) +.117/2 = ms. time to access 1 block time to access the first sector of the block + time to access the subsequent 7 sectors. 25

26 An Example T = 25 + (16.66/2) + (0.117/2) * 1 + (0.13/2) *7 = ms = ms Compare to one sector access time: ms block = 8 Sectors 26

27 Disks Data are transferred between the disk and the main memory in units called blocks A block is a contiguous sequence of bytes from a single track of one platter Block sizes typically range from 512 and 4096 bytes The lower levels of file system manager convert block addresses into the hardware-level cylinder, surface and sector number The block transfer time is the time taken by the head for transferring a block in the buffer, once the head is located at the beginning of the block Such time is much shorter than the time required to position the head at the beginning of the block (seek time) 27

28 Optimization of Disk-Block Access Access to data on disk is several orders of magnitude slower than access to data in main memory. Optimization techniques besides buffering of blocks in main memory. Scheduling: If several blocks from a cylinder need to be transferred, we may save time by requesting them in the order in which they pass under the heads. A commonly used disk-arm scheduling algorithm is the elevator algorithm. File organization. Organize blocks on disk in a way that corresponds closely to the manner that we expect data to be accessed. For example, store related information on the same track, or physically close tracks, or adjacent cylinders in order to minimize seek time. Nonvolatile write buffers. Use nonvolatile RAM (such as battery-back-up RAM) to speed up disk writes drastically (first write to nonvolatile RAM buffer and inform OS that writes completed.( Log disk. Another approach to reducing write latency is to use a log disk, a disk devoted to writing a sequential log. All access to the log disk is sequential, essentially eliminating seek time, and several consecutive blocks can be written at once, making writes to log disk several times faster than random writes. 28

29 File Organization Data are generally stored in records A record consists of a set of related values Each value is represented by one or more bytes and corresponds to a field of the record A record type consists of the names of the fields in the record together with the corresponding value types The number of bytes required for storing a value of a given type is fixed for each system 29

30 Pages and Blocks Data files decomposed into pages (blocks) fixed size piece of contiguous information in the file sizes range from 512 bytes to several kilobytes block is the smallest unit for transferring data between the main memory and the disk. Address of a page (block): (cylinder#, track# (within cylinder), sector# (within track) 30

31 Pages and Blocks Track Secto r Gap One track page/block = 4 Sectors 31

32 Blocking Factor Blocking Factor (bfr) - the number of records that can fit into a single block. bfr = B/R B : Block size in bytes R: Record size in bytes Example: Record size R = 100 bytes Block Size B = 2,000 bytes Thus the blocking factor bfr = 2000/100 = 20 The number of blocks b needed to store a file of records: b = r/bfr blocks 32

33 Example Consider a disk with block size B = 1024 bytes. Suppose we want to construct a secondary index on field xyz of size 25 bytes. What is the index blocking factor if an index pointer is 7 bytes long? Solution: Hint: compute first the size of each index entry Ri. Ri = V + P = = 32 bfri = B/Ri = 1024/32 = 32 33

34 File Organization A record of the following record type Name of the Record Type type Employee = record Field Names Field Types Emp#: integer; Name: array[1..20] of char; Job: array[1..10] of char; HiringDate: date; Salary: integer; Bonus: integer; Dept#: integer; can be stored in 50 bytes (4*4=16 for the integers + 4 for the date for the strings) 34

35 example Consider a relation emp with schema (ID, Name, Dept, Salary) and the following characteristics: emp = 1, 000, 000; the size of the attribute values is: ID = 4 bytes, Name = 50 bytes, Dept = 30 bytes, Salary = 6 bytes. The ID values are equally distributed between 1 and 10, 000, 000, and there are no two tuples with identical ID. A pointer occupies 6 bytes, and we assume a block size of 2, 000 bytes.further, we assume seek time = sec, latency = sec, and transfer time =0.001 sec. a. Determine the number of blocks needed to store the relation emp if 10% of each data block are reserved for future insertions. b. Assume a B+-tree index on the attribute ID. Determine the number of blocks (= number of nodes) used for the B+-tree if index blocks are filled up to 75%. 35

36 solution a. Data blocks: 0.9 2, 000 = 1, 800 bytes used in every data block 1, 800/( ) = 20 tuples/block 1, 000, 000/20 = 50, 000 data blocks needed for emp b. Index blocks: = 1,500 bytes used in every index block 1, 500/(4 + 6) = 150 index entries/block - level 3 (leaf nodes): 1, 000, 000/150 = 6, 667 blocks - level 2: 6, 667/150 = 45 blocks - level 1: 45/150 = 1 block 6, 713 index blocks are needed 36

37 File Organization A file is a sequence of records The file is a fixed record file if all records stored in the file have the same size (in bytes) A file may also store variable length records for several cases: Fields with variable size Multivalued fields Optional fields Files with heterogeneous records: the file contains records of different types, and thus of different sizes (for example, when tuples from different relations are clustered) 37

38 Files with fixed length records Record deletion: Insertions are more frequent than deletions The records are not moved; the space occupied by the deleted record is left unoccupied and used for next insertion It is necessary to allocate some auxiliary structures (file header) to efficiently determine where to insert a new record (the file header contains the pointer to the first deleted record; all deleted records are linked) 38

39 Files with variable length records Example Employee record type suppose that The attribute job is multivalued The bonus attribute is optional and thus may be missing Different representations are possible 39

40 Files with variable length records Byte string representation A special end-of-record symbol is added to denote the end of each record Each record is stored as a contiguous sequence of bytes Special separator characters, that do not appear as value in any field, are used to denote the end of the variable length fields 40

41 Files with variable length records Example of byte string representation 7369 Red engineer manager Dec Andrew technician Feb Martin secretary interpreter Sept Black engineer Jun

42 Files with variable length records byte string main disadvantages: It is not easy to re-use the space occupied by a deleted record; there are techniques to manage insertions and deletions, but they tend to generate fragmentation If the length of a record increases, the record must be moved This can be expensive when the record is referenced by another record 42

43 Files with variable length records Fixed-length reserved-space representation A file with variable length records is represented through a file with fixed length records For each record, the maximum length is allocated problem: poor use of space => slow accesses (the records will tend to spread over several blocks) 43

44 Files with variable length records Fixed-length reserved-space representation 7369 Red engineer manager Dec Andrew technician # Feb # Martin secretary interpreter Sept # Black engineer # Jun

45 Files with variable length records Fixed-length representation A variable length record is represented by a list of fixed length records, linked through pointers In order to avoid repeating the fields that have a single value, two types of block are used in a file: anchor block, storing the first record of each list overflow block, storing the subsequent records of each list 45

46 Files with variable length records Fixed-length representation 7369 Red engineer Dec Andrew technician Feb # Martin secretary Sept # Black engineer Jun manager interpreter 46

47 Organization of records in blocks A file can be seen as a collection of records the data are transferred between secondary storage and MM in blocks,so it is important to assign the records to blocks so that a same block contains records that are related If records that are often used together are stored in the same block, the number of disk accesses is reduced ex.: in the fixed-length organization for the variable length records, it can be useful to store all records, representing the same variable-length record, in the same block If several updates are executed, a block ends up storing records from different lists 47

48 Organization of records in blocks It is not always possible to organize the records in blocks so that each block is completed occupied by records very often each block has some storage which is not used It is possible storing one part of a record in a block and the other part in another block Unspanned records: A record is found in one and only one block. records do not span across block boundaries. Used with fixed-length records having B R Spanned records: Records are allowed to span across block boundaries. Used with variable-length records having R B 48

49 49

50 Organization of records in blocks Techniques for the allocation of file blocks on disk: contiguous allocation: the file blocks are allocated on contiguous disk blocks Reading the entire file is very efficient Expanding the file is difficult Linked allocation: each file block contains a pointer to the next file block Expanding the file is difficult Reading the entire file is not very efficient Use of buckets (that is, of groups of blocks) for groups of records that are related (for example, all employees of the same department) 50

51 Organization of records in blocks It is possible to have buckets occupying several blocks: the blocks of the same bucket are linked (the block header stores the pointer to the next block) If the size of a bucket increases, new blocks are allocated The free blocks are linked so that can be used when new insertions are executed in the same bucket It is better not to reuse the free blocks of a bucket to store records from another bucket The blocks of the same bucket are stored in the same cylinder 51

52 Mapping of relations onto files For DBMS of small size, a solution often adopted is to store each relation in a separate file For large-scale DBMS such basic strategy must be extended (the DBMS must allocate the records in the blocks so to minimize the I/O operations) A frequent strategy is to allocate for the DBMS a large file storing all relations Such file is managed by the DBMS 52

53 Clustering Consider the following query: SELECT Emp#, Name, Office FROM Employees, Departments WHERE Employees.Dept# = Departments. Dept# An efficient storage strategy is based on clustering (that is, grouping) the tuples of the two tables that have the same value for the join attribute Clustering may make the execution of other queries inefficient ex. SELECT * FROM Departments 53

54 54

55 Clustering 10 Building Construction 1100 D Neri Engineer 01/06/ Dare Engineer 17/11/ Mill Engineer 23/01/ GreenManager 10/12/ ? Research 2200 D Red Engineer 17/12/ Pink Manager 02/04/ ? Scott Secretary 09/11/ ? Adams Engineer23/09/ Ford Secretary 03/12/ ? Road Maintenance 5100 D Andrews Technician 20/02/ ? White Technician 20/02/ Black Manager 01/05/ ? Gianni Engineer 03/12/ ? 30 55

56 Buffer management The main goal of the storage strategies is to minimize the number of disk accesses Another strategy is to keep the largest number of blocks in MM A buffer can be used to keep in MM several file blocks A buffer is organized in pages, that have the same size of the blocks Performing operations on pages that are already in the buffer greatly improves performance 56

57 Accessing Data Through RAM Buffer RAM Block transfer Application Buffer Record transfer Page frames block 57

58 Buffer management The buffer manager of a DBMS uses buffer management policies that are more sophisticated than those used in OS: LRU policies are not always the most suitable for DBMS For reasons related to recovery management, in some cases a buffer page cannot be transferred on disk (in such case the block is said to be pinned); in other cases, the block has to be forced to disk a DBMS is able to predict better than the OS future references to blocks 58

59 Buffer management: Example Consider the join operation Employees x Departments (assume that the two relations be in different files) relation Employees Once that a tuple of this relation has been used, it is not any longer As soon as all the tuples of a block have been processed, the block is not any longer needed toss immediate strategy 59

60 Buffer management: Example Departments relation The block most recently accessed will be accessed only after all the other blocks will be processed The best strategy for the Departments file is to remove the last used block most recently used strategy MRU It is however necessary pin the block being currently processed until all tuples in the block have been analyzed; once all tuples have been analyzed, the block can be unpinned 60

61 Auxiliary access structures Very often queries only access a small subset of the data To efficiently process queries, it can be useful to allocate some additional data structures (called auxiliary access structures) able to directly find the records that verify a given query (without requiring a scan of all the data) 61

62 Auxiliary access structures Each technique must be evaluated with respect to: Access time Insertion time Deletion time Storage size Very often it is preferable to increase storage occupancy if this improves performance We use the term search key to denote an attribute or attribute set used for the search in the data structure (different from the notion of primary key of the relational data model) 62

63 Auxiliary access structures A search can be performed based on: primary key: the value of the key identifies a unique record Ex. The taxpayer with SSN secondary key: the value of the key can identify multiple records Ex. The taxpayers living in West Bank value range (both for the primary key and secondary key) Ex. The taxpayers with an income between 60K and 90K combination of the previous conditions Ex. The taxpayers living in West Bank with an income between 60K and 90K 63

64 Auxiliary access structures To perform efficient searches, one could maintain the file ordered according to the values of a search key searchers based on other search keys are inefficient 64

65 Auxiliary access structures Primary organizations: The impose an allocation criteria for the data (for example, hash functions) Secondary organizations: Use of indexes (separated from the data file) usually organized according to a tree In general several access paths to the data are present 65

66 Auxiliary access structures Key Dept# Index organized as a Tree (secondary organization based on the Dept# attribute) Key Emp# Data Hussam file with A. Halim hash structure Computer (primary Science organization Department based 2013on the Emp# attribute) 66

67 Indexes Data structures to organize records via trees or hashing. -Like sorted files, they speed up searches for a subset of records, based on values in certain ( search key ) fields - Updates are much faster than in sorted files. basic idea: to associate with a file, a table containing entries of the form: (k i,r i ) where: k i is a value of the search key on which the index is built r i is a reference to the record (or records) having k i as value of the search key The reference can actually be the address (logical or physical) of a record or of a block 67

68 Data file Index: Indexes: example keys c5 c2 c11 c7 c4 Addesses key k i address r i c2 8 c4 48 c5 0 c7 32 c

69 Indexes as Access Paths A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified on one field of the file (although it could be specified on several fields) One form of an index is a file of entries <field value, pointer to record>, which is ordered by field value The index is called an access path on the field. The index file usually occupies considerably less disk blocks than the data file because its entries are much smaller A binary search on the index yields a pointer to the file record 69

70 Measuring goodness On what basis do we compare different indices? 1. Access type: what type of queries can be answered: selection queries (ssn = 123)? range queries ( 100 <= ssn <= 200)? 2. Access time: what is the cost of evaluating queries Measured in # of block accesses 3. Maintenance overhead: cost of insertion / deletion? (also BA s) 4. Space overhead : in # of blocks needed to store the index

71 Indexing Primary (or clustering) index on SSN STUDENT Ssn Name Address 123 smith main str 234 jones forbes ave 345 smith forbes ave

72 forbes ave forbes ave forbes ave main str main str Address-index Indexing Secondary (or non-clustering) index: duplicates may exist Can have many secondary indices but only one primary index STUDENT Ssn Name Address 123 smith main str 234 jones forbes ave 345 tomson main str 456 stevens forbes ave 567 smith forbes ave

73 Indexing secondary index: typically, with postings lists forbes ave main str Postings lists STUDENT Ssn Name Address 123 smith main str 234 jones forbes ave 345 tomson main str 456 stevens forbes ave 567 smith forbes ave

74 Indexing Primary/sparse index on ssn (primary key) >=456 >=123 STUDENT Ssn Name Address 123 smith main str 234 jones forbes ave 345 tomson main str 456 stevens forbes ave 567 smith forbes ave

75 Indexing Secondary / dense index Secondary on a candidate key: No duplicates, no need for posting lists Ssn Name Address 345 tomson main str 234 jones forbes ave 123 smith main str 567 smith forbes ave 456 stevens forbes ave

76 Summary All combinations are possible Dense Sparse Primary rare usual secondary usual at most one sparse index as many as desired dense indices usually: one primary index (probably sparse) and a few secondary indices (dense)

77 Indexes Indexes can be characterized as dense or sparse. A dense index has an index entry for every search key value (and hence every record) in the data file. A sparse (or nondense) index, on the other hand, has index entries for only some of the search values The are different types of Indexes Single level Indexes Primary Index Cluster Index Secondary Index Multi-level Index 77

78 Dense Index Sequential File

79 Sparse Index Sequential File

80 Sparse Vs. Dense Indices Id Name Dept Sparse primary index sorted on Id Ordered file sorted on Id Dense secondary index sorted on Name 80

81 Sparse Vs. Dense Indices Ashby, 25, 3000 Ashby Cass Basu, 33, 4003 Bristow, 30, 2007 Cass, 50, Smith Daniels, 22, 6003 Jones, 40, Sparse primary index on Name Smith, 44, 3000 Tracy, 44, Ordered file on Name Dense secondary index on Age 81

82 Sparse Indices Sparse index contains index records for only some search-key values. Some keys in the data file will not have an entry in the index file Applicable when records are sequentially ordered on search-key (ordered files) Normally keeps only one key per data block Less space (can keep more of index in memory) Less maintenance overhead for insertions and deletions. 82

83 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Later: sparse better for insertions dense needed for secondary indexes) 83

84 Next: Duplicate keys Deletion/Insertion Secondary indexes 84

85 Duplicate keys

86 Duplicate keys Dense index, one way to implement?

87 Duplicate keys Dense index, better way?

88 careful if looking for 20 or 30! Duplicate keys Sparse index, one way?

89 Duplicate keys Sparse index, another way? place first new key from block should this be 40?

90 Summary Duplicate values, primary index Index may point to first instance of each value only Index a a a.. b File 90

91 Deletion from sparse index

92 Deletion from sparse index delete record

93 Deletion from sparse index delete record

94 Deletion from sparse index delete records 30 &

95 Deletion from dense index

96 Deletion from dense index delete record

97 Insertion, sparse index case

98 Insertion, sparse index case insert record our lucky day! we have free space where we need it!

99 Insertion, sparse index case insert record Illustrated: Immediate reorganization Variation: insert new block (chained file) update index

100 Insertion, sparse index case insert record overflow blocks (reorganize later...) 100

101 Insertion, dense index case Similar Often more expensive

102 Secondary indexes Sequence field

103 Secondary indexes Sparse index Sequence field does not make sense!

104 Secondary indexes Dense index Sequence field sparse high level

105 With secondary indexes: Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) 105

106 Duplicate values & secondary indexes

107 Duplicate values & secondary indexes one option... Problem: excess overhead! disk space search time

108 Duplicate values & secondary indexes another option Problem: variable size records in index!

109 Duplicate values & secondary indexes Another idea (suggested in class): Chain records with same key? Problems: Need to add fields to records Need to follow chain to know records

110 Duplicate values & secondary indexes buckets 110

111 Example Given the following data file: EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL,... ) Suppose that: record size R=150 bytes block size B=512 bytes r = records 111

112 Indexes as Access Paths (contd.) Example: Given the following data file: EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL,... ) Suppose that: record size R=150 bytes block size B=512 bytes r=30000 records Then, we get: blocking factor Bfr= B div R= 512 div 150= 3 records/block number of file blocks b= (r/bfr)= (30000/3)=10000 blocks For an index on the SSN field, assume the field size V SSN =9 bytes, assume the record pointer size P R =7 bytes. Then: index entry size R I =(V SSN + P R )=(9+7)=16 bytes index blocking factor Bfr I = B div R I = 512 div 16= 32 entries/block number of index blocks b= (r/ Bfr I )= (30000/32)= 938 blocks binary search needs log 2 bi= log 2 938= 10 block accesses This is compared to an average linear search cost of: (b/2)= 30000/2= block accesses If the file records are ordered, the binary search cost would be: log 2 b= log = 15 block accesses 112

113 Types of Single-Level Indexes primary index: is specified on the ordering key field of an ordered file, where every record has a unique value for that field. The index has the same ordering as the one of the file. clustering index: is specified on the ordering field of an ordered file. The index has the same ordering as the one of the file. An ordered file can have at most one primary index or one clustering index, but not both. secondary index: is specified on any nonordering field of the file. The index has different ordering than the one of the file. A file can have several secondary indices in addition to its primary/clustering index. 113

114 Primary Index Defined on an ordered data file The data file is ordered on a key field Includes one index entry for each block in the data file; the index entry has the key field value for the first record in the block, which is called the block anchor A similar scheme can use the last record in a block. A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value. 114

115 Example: Primary Index 115

116 Clustering Index Defined on an ordered data file The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record. Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value. It is another example of nondense index(sparse) where Insertion and Deletion is relatively straightforward with a clustering index. A file can have at most one primary index or one clustering index, but not both. 116

117 Example: Clustering Index 117

118 Example: Clustering Index with disk block for each group 118

119 Secondary Index A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record, or a nonkey with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. Includes one entry for each record in the data file; hence, it is a dense index 119

120 Secondary Indices A secondary index usually needs more storage space and longer search time than does a primary index. It has larger number of entries. Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive each record access may fetch a new block from disk The index is an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. 120

121 Example: A dense Secondary Index A dense secondary index (with block pointers) on a nonordering KEY field. 121

122 Example: A Secondary Index using block of Pointers 122

123 123

124 Multi-Level Indexes Because a single-level index is an ordered file, we can create a primary index to the index itself ; in this case, the original index file is called the first-level index and the index to the index is called the second-level index. We can repeat the process, creating a third, fourth,..., top level until all entries of the top level fit in one disk block A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the firstlevel index consists of more than one disk block The idea behind multilevel indexes is to reduce the part of the index file that we continue to search by a factor of bfri (the fan-out (fo )= (block size in bytes/record size in bytes) 124

125 Multilevel indexes External index Internal index Index block 0 Data block 0 Index block 1 Data block 1 If the external index is kept in MM, only a single block must be accessed 125

126 Example: A two-level Primary Index 126

127 Multi-Level Indexes Such a multi-level index is a form of search tree however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file. A typical node in a search tree is shown below 127

128 A search tree of order p =

129 Multilevel Indexes Using B-Trees and B+-Trees Because of the insertion and deletion problem, most multi-level indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries These data structures are variations of search trees that allow efficient insertion and deletion of new search values. In B-Tree and B+-Tree data structures, each node corresponds to a disk block Each node is kept between half-full and completely full 129

130 B-tree and B*-Tree. Important characteristics of B-trees are: B-trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed. B-trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are ever needed for the index. Each block will have space for n search-key values and n+1 pointers. B-tree blocks are similar to conventional index blocks, except that the B-tree block has an extra pointer 130

131 example Suppose block size is 4096 bytes Keys are integers of 4 bytes Pointers are 8 bytes If no header information is kept on the blocks, we want to find the largest integer value n such that 4 n + 8(n+1) <= This gives n=340. This means that a block can hold 340 key values and 341 pointers. 131

132 B-Trees and B+-Trees An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes Splitting may propagate to other tree levels A deletion is quite efficient if a node does not become less than half full If a deletion causes a node to become less than half full, it must be merged with neighboring nodes 132

133 Difference between B-tree and B+tree In a B-tree, pointers to data records exist at all levels of the tree In a B+-tree, all pointers to data records exists at the leaf-level nodes A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree 133

134 B+Tree Example n=3 Root 134

135 Sample non-leaf to keys to keys to keys to keys < k< k<

136 To record with key 120 To record with key unused Sample leaf node: From non-leaf node to next leaf in sequence 136

137 Don t want nodes to be too empty Number of pointers in use: at internal nodes at least (n+1)/2 (to child nodes) at leaves at least (n+1)/2 (to data records/blocks) x min{n Z n x} x max{n Z n x} 137

138 n=3 Full node min. node Non-leaf Leaf 138

139 B+tree rules (1) All leaves at the same lowest level (balanced tree) (2) Pointers in leaves point to records except for sequence pointer 139

140 Insert into B+tree First lookup the proper leaf; (a) simple case leaf not full: just insert (key, pointer-to-record) (b) leaf overflow (c) non-leaf overflow (d) new root 140

141 (a) Insert key = 32 n=3 141

142 (b) Insert key = 7 n=3 142

143 (c) Insert key = 160 n=3 143

144 (d) New root, insert 45 n=3 new root Height grows at root => balance maintained 144

145 Deletion from B+tree Again, first lookup the proper leaf; (a): Simple case: no underflow; Otherwise... (b): Borrow keys from an adjacent sibling (if it doesn't become too empty); Else... (c): Coalesce with a sibling node - (d): Cases (a), (b) or (c) at non-leaf 145

146 (b) Borrow keys Delete 50 n=4 => min # of keys in a leaf = 5/2 = 2 146

147 (c) Coalesce with a sibling Delete 50 n=4 147

148 (d) Non-leaf coalesce Delete 37 n=4 => min # of keys in a non-leaf = (n+1)/2-1=3-1= 2 new root 148

149 B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! later insertions may return the node back to its required minimum size Compromise: Try redistributing keys with a sibling; If not possible, leave it there if all accesses to the records go through the B-tree, can place a "tombstone" for the deleted record at the leaf 149

150 Why B-trees Are Good? B-tree adapts well to insertions and deletions, maintaining balance DBA does not need to care about reorganizing split/merge operations rather rare (How often would nodes with 200 keys be split?) - Access times dominated by key-lookup (i.e., traversal from root to a leaf) 150

151 Examples of Clustered Indexes 151

152 152

153 153

154 154

155 155

156 Creating a Hash Cluster A hash cluster is created through use of the HASHKEYS clause. CREATE CLUSTER HashOrderCluster ( OrderId NUMBER(3) ) PCTUSED 60 PCTFREE 40 SIZE 1200 TABLESPACE Users HASH IS OrderID HASHKEYS 150; The primary key (hash key) can be one or more columns the above example has a single column named OrderID. The HASHKEYS clause value specifies and limits how many unique hash values can be generated the actual value is rounded to the nearest prime number by Oracle. The HASH IS clause is optional. 156

157 CREATE TABLE HashTestOrders ( OrderId NUMBER(3) CONSTRAINT HashTestOrders_PK Primary Key, OrderDate DATE CONSTRAINT HashOrderDate_NN NOT NULL, Order_Amount NUMBER(10,2) ) CLUSTER HashOrderCluster (OrderId) ; CREATE TABLE HashTestOrderDetails ( ProductId NUMBER(5), OrderId NUMBER(3), Quantity_Ordered NUMBER(3) CONSTRAINT HashQuantity_Ordered_CK CHECK (Quantity_Ordered >= 0), ItemPrice NUMBER(10,2), CONSTRAINT HashTestOrderDetails_FK FOREIGN KEY (OrderId) REFERENCES TestOrders ON DELETE CASCADE, CONSTRAINT HashTestOrderDetails_PK PRIMARY KEY (ProductId, OrderId) ) CLUSTER HashOrderCluster (OrderId) ; 157

158 It can be used to specify the cluster key; this is only necessary if the intent is NOT to hash on the primary key or if the primary key is of data type NUMBER and is unique already so that the internal hash function will be sure to distribute the rows uniformly among the disk space allocated to the cluster. Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype. This parameter can also be used to specify a user-defined hash function to use instead of the function provided by default by Oracle. 158

159 example a cluster is created that has a CHAR datatype primary key and no HASH IS clause is specified so the cluster will be hashed on the Prescription_Drug_Code column value. CREATE CLUSTER Prescription_Drug_Cluster (Prescription_Drug_Code CHAR(5)) SIZE 512 SINGLE TABLE HASHKEYS 500; Hash clusters cannot be indexed. Hash clusters should have a hash key that reflects how SELECT statement WHERE clauses will query the database, but most often the cluster is hashed on the primary key. 159

160 summary 160

161 Example Consider a relation emp with schema (ID, Name, Dept, Salary) and the following characteristics: emp = 1, 000, 000; the size of the attribute values is:id = 4 bytes, Name = 50 bytes, Dept = 30 bytes, Salary = 6 bytes. The ID values are equally distributed between 1 and 10, 000, 000, and there are no two tuples with identical ID. A pointer occupies 6 bytes, and we assume a block size of 2, 000 bytes. Further, we assume seek time = sec, latency = sec, and transfer time =0.001 sec. a. Determine the number of blocks needed to store the relation emp if 10% of each data block are reserved for future insertions. Answer: Data blocks:0.9 2, 000 = 1, 800 bytes used in every data block 1, 800/( ) = 20 tuples/block 1, 000, 000/20 = 50, 000 data blocks needed for emp b. Assume a B+-tree index on the attribute ID. Determine the number of blocks (= number of nodes) used for the B+-tree if index blocks are filled up to 75%.. Index blocks: = 1,500 bytes used in every index block 1, 500/(4 + 6) = 150 index entries/block - level 3 (leaf nodes): 1, 000, 000/150 = 6, 667 blocks - level 2: 6, 667/150 = 45 blocks - level 1: 45/150 = 1 block 6, 713 index blocks are needed 161

162 c. Consider the following two queries: Q1: ID=1Mio(r) Q2: ID>7.5Mio(r) Determine the number of block IOs (distinguishing between index blocks, data blocks, and total number of blocks) for the following two cases: (i) the B+-tree is a primary index and (ii) the B+-tree is a secondary index. Answer: c. Q1 (i) primary index: 3 index blocks (one at each level) 1 data block = 4 blocks are transfered in total Q1 (ii) secondary index: the same as for primary index Q2 (i) primary index: 3 index blocks (one at each level) to locate the first tuple 250, 000/20 = 12, 500 data blocks to retrieve (on avg. 250, 000 tuples match, and all matching tuples are in contiguous blocks) , 500 = 12, 503 blocks are transfered in total Q2 (ii) secondary index: On avg. 250, 000 data tuples match 3 index blocks (one at each level) to locate the first tuple 250, 000/150 = 1667 index (leaf) nodes/blocks to locate the other tuples 250, 000 data blocks to retrieve (in the worst case each matching tuple might be in a different block) 3 + 1, , 000 = 251, 670 blocks are transfered in total (a full scan of the data relation without using the index would be more efficient) 162

163 d. Determine the execution time for the queries in c.) Answer: d. Execution time: Q1 (i) primary index: = sec Q1 (ii) secondary index: the same as for primary index Q2 (i) primary index: 12, = sec Q2 (ii) secondary index: 251, = 8, sec 163

Chapter 18. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes

Chapter 18. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes Chapter 18 Indexing Structures for Files Chapter Outline Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees