B-Trees CS321 Spring 2014 Steve Cutchin
Topics for Today HW #2 Once Over B Trees Questions PA #3 Expression Trees Balance Factor AVL Heights Data Structure Animations Graphs 2
B-Tree Motivation When data is too large to fit in main memory, then the number of disk accesses becomes important. A disk access is unbelievably expensive compared to a typical computer instruction (mechanical limitations). One disk access is worth about 200,000 instructions. The number of disk accesses will dominate the running time. 3
Motivation Cont.. Secondary memory (disk) is divided into equalsized blocks (typical sizes are 512, 2048, 4096 or 8192 bytes) The basic I/O operation transfers the contents of one disk block to/from main memory. Our goal is to devise a multiway search tree that will minimize file accesses (by exploiting disk block read). 4
m-ary Trees K1 K2 K3 K4 T1 T2 T3 Etc. K < K1 K1 < K < K2 A node contains multiple keys. Order of subtrees is based on parent node s keys If each node has m children & there are n keys then the average time taken to search the tree is log m n. 5
B Tree Definition A B-Tree is a search tree with a root node. Each node in a B-Tree can have multiple keys. Each node in a B-Tree can have multiple children. The number of children is dependent on the number of keys. A node in a B-Tree has at most 1 more child than it has keys. 6
Layout of a B-Tree Each node has at most 3 keys and 4 children. Each node has a minimum of 2 children. This is a 2-3-4 B-Tree 7
Important Metrics The minimal degree of a B-Tree is defined as: Degree = t, t >= 2. Every node except root has at least t children. Every node except root has at least t-1 keys. Every node except root has at most 2*t 1 keys. The order of a B-Tree is defined as: Order = m No node may have more than m children. Therefore: Order = 2*degree; 8
Layout of a B-Tree What is the degree of this B-Tree? What is the order of this B-Tree? 9
Size of B Trees All leaves in a tree have the same depth. The depth of a B-Tree is uniform and equal to its height. By definition all B-Trees are balanced. 10
Size of B Trees For a given B-Tree with n keys and degree t Height h <= log t ((n+1)/2); For a given B-tree with height of h and degree t n >= 2 * t h - 1 11
B-Tree and Block Size A B-Tree Node is usually the size of a Disk Page. So if a Disk Page = 4096 bytes we want our Node to be that size: Say, 84 bytes overhead for the Node. 4 Bytes for each key. 4 Bytes for each child pointer. 4 bytes for num keys, 4 bytes num children. 12
B-Tree and Block Size 4096 = 4K + 4C + 4 + 4 + 84. C = K+1. 4096 = 4K + 4K+4 + 4 + 4 + 84. 4096 = 8K + 12 + 84 4096-12 -84 = 8K K = 500 Keys per Node for one block. C = 501 Children per Node for each block. A tree of height 2 has 125,751,500 Keys A tree of height 2 has 251,503 Disk Blocks. 13
Definition of a B-Tree Def: B-tree of degree t is a tree with the following properties.: The root has at least 2 children, unless it is a leaf. Every non-root node must have t-1 keys. Every non-root internal node has t children. If the tree is non-empty the root has at least one key. Every node may have at most 2t-1 keys. An internal node may have at most 2t children. A full tree occurs when every node has 2t-1 keys. 14
Components of B-Tree Nodes Every node x has the following attributes: X.n = the number of keys in X X.keys[n] = the actual keys. X.leaf = is this a leaf? Can the root be a leaf? X.child[n+1] = array of pointers to the children. Rule: key[1] <= key[2] <= key[n]. 15
Definition of a B-Tree Def: B-tree of order m is a tree with the following properties: The root has at least 2 children, unless it is a leaf. No node in the tree has more then m children. Every node except for the root and the leaves have at least m/2 children. All leaves appear at the same level. An internal node with k children contains exactly k-1 keys. 16
B-Trees & Efficiency Used in Mac, NTFS, OS2 for file structure. Allow insertion and deletion into a tree structure, based on log m n property, where m is the order of the tree. The idea is that you leave some key spaces open. So an insert of a new key is done using available space (most cases). Less dynamic then our typical Binary Tree Efficient for disk based operations. 17
2-3 Trees G C I M A D E H J K N O 18
B Tree Operations (adt) Search(key) Insert(key) Delete(key) 19
Searching m-ary Trees A generalized SOT will visit all keys in ascending order. for (i==1;i<=m-1;i++) { visit subtree to left of k i visit k i } visit subtree to right of k m-1 20
Basic Recursive Search Ordered Recursive Search. Array indexed by 1. Search(T,k) for (i==1;i<=m-1;i++) { if (k < k i ) return Search(T.child[i],k); } Return Search(T.child[m],k); Notice the for loop! O(?) 21