Unit 3 Disk Scheduling, Records, Files, Metadata Based on Ramakrishnan & Gehrke (text) : Sections 9.3-9.3.2 & 9.5-9.7.2 (pages 316-318 and 324-333); Sections 8.2-8.2.2 (pages 274-278); Section 12.1 (pages 394-397); Silberschatz, et. al. ( Operating System Concepts ); Original slides by Ed Knorr; Updates and changes by George Tsiknis
Learning Goals Explain why page requests from disk may not be serviced immediately. List some of the points of contention. Explain the relationship among disk geometry, buffer pool management, and disk scheduling in providing good performance for large data requests from a user of a DBMS. List the bottlenecks that may contribute to poor I/O performance in this disk chain. Compute the service order for a queue of track/cylinder/page requests using each of these disk scheduling algorithms: FCFS (First Come, First Serve), SSTF (shortest seek time first), and Elevator (Scan with and without Look) Compare and contrast the record layouts for fixed-length and variable-length records in a DBMS. Provide an advantage for each. Compare and contrast the page layouts for fixed-length and variable-length records in a DBMS. Provide an advantage for each. Unit 3 2
Learning Goals (cont.) Justify the use of free space within a page, and intermittent free pages within a file, for an RDBMS table. Given probabilities of average string lengths, determine whether it makes more sense to use a fixed-length field, rather than a variable-length field. Give at least ten examples of the kinds of metadata stored for a DBMS. Justify the use of metadata from the perspective of both a DBMS and a DBA. [During assignments] Query an RDBMS catalog for metadata that is of interest to a DBA. Provide arguments for storing RDBMS metadata as a table rather than as a flat file or some other data structure. Unit 3 3
Link to Previous Units Recall from previous units the relationship between disk pages and buffer pool pages. A page is chosen for replacement using a page replacement algorithm (PRA), of which there are many: FIFO, LRU (including several variants), MRU, Clock, Extended Clock, etc. The optimal PRA is unachievable for most workloads. Why? Another topic that involves disks and RAM is Disk Scheduling. We ll briefly touch upon it in the next few slides. Unit 3 4
Disk Service Time (to Fetch Pages) DBMSs read and write disk pages frequently The fact that you request disk page p does not necessarily mean that you ll get service quickly even after considering seek, rotation, and transmission latencies. Why not? Contention from: the same transaction (waiting for previous requests to complete) other transactions other non-db processes Disk scheduling algorithm may delay processing the request Unit 3 5
Disk Scheduling Algorithms There are a number of different disk scheduling algorithms. In this course, we will only briefly touch upon the following: First Come, First Served (FCFS ) requests are serviced in the order they come Shortest Seek Time First (SSTF) the request that is the closest to the current position will be serviced next Elevator Algorithm ( scan with Look option) disk arm keeps moving in one direction, servicing a request when it gets on the requested cylinder. when there are no more requests in the current direction it switches direction In another version of the Elevator algorithm that uses the Scan without Look option, the disk arm will continue moving in one directions till the end cylinder is reached, even though there are no more requests in that direction Look means only go as far as needed (elevator analogy: don t go to the extreme floors unless a user request has been made) Unit 3 6
Disk Scheduling Algorithms (cont.) Suppose user(s) make the following near-simultaneous requests for pages from these cylinders, and the head is currently on cyl. 165, having just come from cyl. 164: 1400, 2500, 170, 160, 161, 3500, 162 What is the service order for the following disk scheduling algorithms? FCFS: SSTF: Which is more efficient, and why? Is there any unfairness? Explain. Unit 3 7
Disk Scheduling Algorithms (cont.) Assume we re on cyl. 165, having just come from cyl. 164, and now we get the following requests: 1400, 2500, 170, 160, 161, 3500, 162 What is the service order for the popular Elevator Algorithm? What is the service order if, right after servicing cyl. 1400, we get these new requests: 1250, 1400, and 1500? More examples will be online in the practice questions and answers. Unit 3 8
Files Now we ll see how data can be organized into files. A file is a collection of pages. Organization of pages within file depends on the file type (see later) A page is a collection of records Page format depends on the record type: fixed or variable length. A record is a collection of fields Record format depends on the record type. Unit 7 9
Record Formats: Fixed Length Number of fields is fixed; length of each field is fixed. The information about field types is the same for all records in the table. Schema info (column names, data types, lengths, order) is stored in the DBMS s catalog. Finding the i th field does not require scanning the whole record. Why not? F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) Address = B+L1+L2 Unit 3 10
Page Formats for Fixed Length Records A page is a collection of slots; one slot per record. Two ways to deal with record insertions/deletions: Slot 1 Slot 2 Slot N Free Space Slot 1 Slot 2...... Slot N PACKED N number of records Slot M 1 M Record id = <page id, slot #>. In the first alternative, moving records for free space management changes the rid, which may not be acceptable. 1... 0 M... 3 2 1 UNPACKED, BITMAP 1 number of slots Unit 3 11
Record Formats: Variable Length Fixed # of fields; some fields have variable length. Two alternative formats : (1) Fields Delimited by Special Symbols: F1 F2 F3 F4 $ $ $ $ (2) Array of Field Offsets (and offset to end) F1 F2 F3 F4 Second method offers direct access to i th field, efficient storage of nulls; small directory overhead. Unit 7 12
Page Formats for Variable Length Records Rid = (i,n) Page i Rid = (i,2) Rid = (i,1) 20 16 24 N N... 2 1 # slots SLOT DIRECTORY Pointer to start of free space Each page has a directory of slots (of variable length). Can move records on page without changing rid Space of deleted record is moved to free; their slots are set to 1; Attractive for fixed-length records too. Unit 3 13
Modifying Variable-Length Rows Can be done in place if no variable-length field changes its length What if the row increases in size due to an update? Solutions (in preferred order): Try to update the row in place (there may be room) Relocate the row on the page, perhaps after compaction (defragmentation) Relocate the row to a near page (e.g., in DB2: within 16 pages of the current page) Relocate the row to a far page Unit 3 14
Planning for Updates, Insertions, and Deletions in DB2 For tables that contain fixed-length or variable-length rows: At CREATE time, DB2 allows a DBA to specify both the initial amount of free space on each page and the number of pages between (currently) blank pages e.g., PCTFREE 15 e.g., FREEPAGE 10 In the catalog table SYSIBM.SYSTABLEPART (see the next few slides on System Catalogs), you ll find: PCTFREE FREEPAGE NEARINDREF FARINDREF NEARINDREF + FARINDREF = # of relocated rows Over time, a REORG may be necessary Unit 3 15 1,2
char( ) vs. varchar( ) Fields in DB2 Compare a char(20) field to a varchar(20) field in DB2, assuming a char is one byte: char(20) : 20 bytes varchar(20) : 0 to 20 bytes for characters + 2 bytes for length Which takes up more space? Suppose 80% of the time, we use 15 characters, and the other 20% of the time, we need 20 characters. Is varchar better? Unit 3 16 3
Basic File Organizations Heap (random order) files: Records are in any order. suitable for retrieving a record given its rid suitable when retrieving all records in the order they are stored (file scan ). Sorted Files: Records are sorted on some attribute. best if records must be retrieved in some order, or only a `range of records is needed. Indexes: Data structures to organize records in a way that optimizes retrieval on a search key condition. Like sorted files, they speed up searches for a subset of records, based on values in certain ( search key ) fields Updates are much faster than in sorted files. Indexed File: A file in which an index for the data and the data records are built together in the same structure. Unit 7 17
Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page There are many alternatives for keeping track of these. Unit 7 18
Heap File Implemented as a List Data Page Data Page Data Page Full Pages Header Page Data Page Data Page Data Page Pages with Free Space The header page id and heap file name must be stored somewhere. Each page contains 2 `pointers plus data. To insert a record of variable length, may need to retrieve several pages. Unit 7 19
Heap File Using a Page Directory Header Page Data Page 1 Data Page 2 DIRECTORY Data Page N The directory is a collection of pages; linked list implementation is just one alternative. Much smaller than linked list of all data pages! The entry for a page can include the number of free bytes on the page. Unit 7 20
Indexes An index on one or more search key fields help us retrieve records by specifying the values in these field(s). For example: Find the name of the student with student ID 86753091. Find all 1st year students with the last name Lee. Any subset of the fields of a table can be the search key for an index on that table. Search key is not the same as key of a table. For each search key value k, the index contains a data entry k*, which has information to locate records with key value k. Given a search key value k we can efficiently search the index to find the entry K* and use it to locate the records for k. Unit 7 21
Indexes (cont ) For any key k of an index, the data entry k* can be one of the following : 1. a data record with key value k 2. the rid of a data record with search key value k 3. a list of rids of data records with search key k First choice affects the structure of the actual file. In this case, the file is an indexed file. At most one index on a given collection of data records can use alternative 1. Index and data are in the same file. In the other alternatives the structure of the index is independent of the structure of the actual file (with the actual records). Typically, an index contains auxiliary information that directs searches to the desired data entries. The index is stored in a different file than the file with the data. Examples of indexing techniques: B+ trees, hash-based structures. Unit 7 22
Index Classification Primary vs. secondary: If index search key contains primary key, then called primary index; Otherwise, it is a secondary index. Unique index: Search key contains a candidate key. Clustered vs. unclustered: If the order of the data records is the same as the order of the index data entries, then the index is a clustered index; Otherwise it is unclustered. If a file has a clustered index, then index and data entries are in the SAME file the data entries of the index are the actual data records (alternative 1). A file can have at most one clustered index. Unit 7 23
Clustered vs. Unclustered Index CLUSTERED Index entries direct search for data entries UNCLUSTERED Data entries Data Records (Index File) (Data file) Data entries The data records of this file are sorted on the attributes that are the index key. Data Records The data records of this file are either not sorted or are sorted on attributes other than the index key. Unit 7 24
System Catalog Catalog tables contain metadata (i.e., data about data) For each table they store: name, file name, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints For each index: name, structure (e.g., B+ tree) and search key fields B+ tree stats (e.g., # of leaf pages, avg. dist. between leaf pages, height) For each view: view name and definition Plus all kinds of other statistics like authorization levels, buffer pool sizes, etc. Unit 3 25
System Catalog A statistics collection utility is run to update the metadata It can run regularly or periodically on specific tables/indexes The catalog itself is stored as a set of relational tables DBAs can access it via SQL Users are usually not allowed Why not? Catalog information is extremely useful to DBAs and, of course, to the DBMS! Unit 3 26 4
Summary There are many disk scheduling algorithms that deal with moving pages to/from disk. Contention and disk scheduling are additional factors to consider when determining disk service times. Fixed-length and variable-length records are supported in DBMSs. There are different ways of managing the space on a data page. Rows can be relocated. There are good reasons to do so. Unit 3 27
Summary (cont.) Different DBMSs have different ways of managing data on pages (e.g., overhead bytes for various purposes, which we don t consider in detail in this course). Many alternative file organizations exist, each appropriate in some situation: heap files, sorted files, indexes Indexes support efficient retrieval of records based on the values in some fields. Indexes can be clustered or unclustered. Differences have important consequences on performance. Metadata is data about data. Catalog relations store information about relations, indexes, views, buffer pools, columns, backups (image copies), etc. Such information is extremely useful to DBAs and, of course, to the DBMS! Unit 3 28