Representing Dynamic Binary Trees Succinctly

Size: px

Start display at page:

Download "Representing Dynamic Binary Trees Succinctly"

Erin Hopkins
6 years ago
Views:

Representing Dynamic Binary Trees Succinctly J. Inn Munro * Venkatesh Raman t Adam J. Storm* Abstract We introduce a new updatable representation of binary trees.

1 Representing Dynamic Binary Trees Succinctly J. Inn Munro * Venkatesh Raman t Adam J. Storm* Abstract We introduce a new updatable representation of binary trees. The structure requires the information theoretic minimum 2n + o(n) bits and supports basic navigational operations in constant time and subtree size in O(lg n). In contrast to the linear update costs of previously proposed succinct representations, our representation supports updates in O(lg 2 n) amortized time. 1 Introduction Trees, particularly binary trees, are elementary structures in many aspects of computing. The standard representation of a tree, with a pointer or two per parentchild relationship, is easy to navigate and update. Furthermore, the structure can be easily augmented so that operations such as determining subtree size can also be supported in constant time. Unfortunately, this representation can be very costly, even prohibitive, in terms of space. This is particularly true in applications such as text indexing, where the node of a binary tree corresponds to an index point in a text file. Taking this reasonable point of view that a pointer used in representing an n node tree takes lg n bits, the usual representation of a binary tree requires 2nigh bits (even without parent pointers). On the other hand, a binary tree can be represented in fewer than 2n bits, as there are only (2~)/(n + 1) or about 22'~/n a/2 binary trees on n nodes. Indeed Jacobson[5], Munro and Raman[7], and others have proposed 2n + o(n) bit representations that permit fast navigation of a tree. These approaches, and the work presented here, all lead to a mapping from the n (internal) nodes of a tree onto the integers [1, n/and from the external nodes onto the integers [1, n + 1]. This leads to a way of associating auxiliary data with nodes and external nodes. Most of these approaches, however, are inherently static. The focus of this paper is a succinct, quickly navigable and updatable representation of binary trees. Our representations deal with arbitrary binary trees on n *Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada, {imuuro, a~torm}~uwaterloo, ca tinstitute of Mathematical Sciences, Chennal, India , vr~m~tt@imsc, ernet, in (internal) nodes. Logically attaching an external node to each position in the tree without a child we have n+l external nodes. Data may be associated with all internal and/or all external nodes. This data is taken to be either of constant size or a lg n 1 bit reference. One choice may be made for the size of internal node data and another for external node data. Proofs, however, are given only for the case where data is associated with external nodes. The updates permitted are the natural insertion or deletion of a single node. We allow insertions to the tree along an edge or by inserting a new leaf. Conversely, a node with one child or a leaf may be deleted. We adopt a natural model of a random access machine under which a lg n bit word can be manipulated with the usual operations in unit time, i.e. the size of the tree roughly matches word size. It was under this model that Jacobson[5] showed how to represent a tree using 2n + o(n) bits and be able to determine the parent or child of a node in lg n bit inspections. Munro and P~man[7] improved this to inspecting a constant number of lg n bit words, and added a number of operations including subtree size. Clark and Munro[3/gave a representation aimed at large trees to be kept on secondary storage. They broke the tree into pieces so that each piece could be stored on a page of memory using a 3n + o(n) bit representation. An update could be made by totally recomputing the page in question and modifying any other pages along the path from the root. In a disk based model, of course, this implies rewriting all pages along such a path. Nevertheless, their approach was effective in the practice of maintaining suffix trees. Although the approach taken here is very different, their work is, in a very loose sense, the starting point for the work presented here. Our main results can be stated as follows: Theorem 1.1 There exists a 2n + o(n) bit binary tree representation that can be created in linear time and facilitates navigation and subtree size queries in constant time. The structure also supports finding any extra (fixed size) data associated with nodes. Given the location at which an insertion or deletion is to be lwe use lgn to denote Jig 2 n + 1] 529

2 performed, updates to the tree can be made in poly-log time depending on data associated with internal and/or external nodes. In particular: If no data is associated with nodes, update time is O(lg ~ n) worst case and O(lglgn) amortized. If data of fixed constant size is associated with internal nodes and/or external nodes, update time is O(lg 3 n) worst case and O(lg n) amortized. If fixed size data of O(lgn) bits (such as references to an arbitrary record) is associated with internal and/or external nodes, update time is O(lg4n) worst case and O(lg 2 n) amortized. In the next section we give a high level description of our structure. Subsequently, we provide a more detailed description of the structure and how it facilitates insertions and deletions. 2 Overview of the Structure We first describe our data structure giving the invariants that later are used by the search procedures and maintained by the update algorithms. The basic notion is to divide the tree into subtrees of size O(lg 2 n) ~the root's subtree may be smaller). We call these O{lg n) sized sub-trees small trees and we store them in blocks. Each of these small trees is then subdivided into tiny trees of O(lg n) nodes. These are stored in sub.blocks. The limited size of these tiny trees enables us to maintain a table of the representations of all possible binary trees of size at most clgn (for any constant c < 1/2). Such a table, permits the representation of a tiny tree of size O(lgn) using a O(lgn) sized pointer to its representation. Moreover, since there are only O(lg n) tiny trees for each small tree, we can use explicit O(lglgn) sized pointers between tiny trees. We now give a more detailed description of the blocking structure and how it responds to additions (the structure's response to deletions is analogous). 2.1 Blocks The tree is divided into subtrees of between lg~n and 3 lg 2 n nodes. These small trees are stored in blocks. (Note that the root's block is the only block that may be less than lg 2 n nodes in size.) This division can be done using a greedy algorithm which performs a post order traversal of the tree in the following manner: At each node we determine the size of the "incomplete blocks" presently containing each of its children. Each external node is viewed as an incomplete block of size 0 and passed to its parent. At each internal node a new block is found by taking the sub-block part from each child togetl~er with the node itself. If this new block contains at least lg 2 n nodes it is a "complete block" and a new incomplete block is passed to its parent. Otherwise, the combined block remains incomplete and is passed to the parent. Finally, the block containing the root is viewed as complete regardless of its size. Clearly we could restrict the size of a (complete) block to being between lg 2 n and 2 lg 2 n nodes, however relaxing the upper bound to 3 lg2n will be helpful in performing updates. The matter of references between small trees is, however, of some concern. As each small tree will have one parent node in another small tree, only O(n/lg 2 n) pointers ( = O(n/lgn) = o(n) bits) are required for references between parent and child small trees. However, an individual small tree could have O(lg ~ n) child small trees. Hence these inter-block child pointers are not stored in the blocks themselves, but in an auxiliary structure. As a consequence the size of a block will depend only on the number of nodes in the small tree it represents Block Organization In allocating storage during updates, it is convenient to group together subtrees of roughly the same size. Hence we say that blocks with between lg 2 n + (i - I) lg n and lg 2 n + i lg n nodes are in group i. Each block in the grouping is allocated the same amount of space: adequate for the largest but wasteful only by a factor of (1 + 1/lgn) for the smallest. Within a block grouping, blocks are stored contiguously so that no space is maintained between blocks. Block groupings are stored in an array, ordered by block size. The grouping with smallest blocks is first in the array, and the grouping of largest sized blocks is at the end. We use the optimally resizable arrays of Brodnik, et. al.[2] which permits accesses, extensions, and contractions of the array to be performed in constant time. The space overhead is proportional to the square root of the number of "words" in the array. Between each block grouping is some empty space to facilitate growth and contraction. This will be at most 3 lg n words of lg n bits each (i.e. at most 3 lg2n bits). To maintain a traversable structure, blocks are connected to their children (and children to parent) using explicit pointers of size lg n. This gives a total of 2 pointers per block (one parent, one child) since each block can be referenced only once as a child. As a result, while one block may have O(lg 2 n) pointers, there will be no more than O(n/lg 2 n) pointers in total Inter-Block Pointers All inter-block pointers for a given block are stored 530

3 contiguously in a separate pointer block. The main difference in the storage technique used for blocks and that used for pointer blocks is that between pointer block groupings there is no unused space. This is because, as we shall see later, pointer blocks only need modification upon block splitting or merging. As a result, when pointer block sizes change, they do so dramatically and so the mechanism employed in block rearranging is invalid. The technique used to maintain pointer blocks is somewhat involved. It can, however, be found in [8]. The inter-block pointers are arranged in a B-tree within the pointer block so that an external node can find its pointer in constant time. We discuss the details of this B-tree in section Sub-Blocks There are two main components of each sub-block: a pointer to the table representation, and leaf numbering information. We first discuss the crucial aspect of pointer components since it is by far the more important aspect, and then we go on to explain the external node numbering information, why it is necessary, and how it is used Pointer To Tree Representation The blocks of size O(lg 2 n) are divided into subblocks of between ¼ lg n and i~ lg n nodes. As mentioned before, these sub-blocks are pointers to a table containing a representation of every possible binary tree with at most ¼ lg n nodes. Since there are roughly 22r binary trees on r (or up to r) nodes, there will be roughly vfn entries in the table. As a consequence there is no problem with the space requirements of representing a copy of each possible subtree, therefore, we will ignore the actual table of vfn trees for the present. The table will actually maintain additional information useful when performing insertions and deletions so we shall return to the issue later. The references to subtrees in this table could be given by the parenthesis encoding of Jacobson[5] or of Munro and Raman[7]. We observe however, that as there are about 4r/2~r3] 2 binary trees on r nodes, 2r- 3 2 lg r- O(1) bits suffice. Adding a lg r+ 2 lg lg r prefix to indicate the value of r gives us a 2r - ~ lg r + o(lg r) bit designation for a subtree. Hence each sub-block of size r can use fewer than 2r bits to encode itself. Ultimately the space taken by our encoding is dominated by the virtually optimal encodings of these "tiny" trees. All other space used is to facilitate navigation, updates and interpretation. Within this extra space we require references to parent and child sub-blocks and to external data fields. Lemma 2.1 (Table Size) Representing the e(lgn) tables can be achieved with O(n c lg n) bits. It is interesting that the tables do not contribute to the dominant space term of our structure External Node Numberings In addition to a pointer to the explicit tree representation, each sub-block stores some information used to determine external node numbers within a given block. As described below, there are three types of external nodes in a sub-block: inter-sub-block pointers, interblock pointers, and genuine external nodes of the tree (these may be implications of real data). Due to the way pointers are stored in our structure, we must be able to determine in constant time, how many external nodes precede the node, within the block, in a preorder traversal. To achieve this we store an array (called the external node numbering array) in each block, which has an entry for each sub-block. A given sub-block's entry in the array stores the number of external nodes that precede the first external node of the sub-block. Additionally, in the table of tree representations, for each node of the "tiny" tree we store the number of external nodes preceding the node within the sub-block. With this information, and the number of external nodes before the root of the sub-block and within the block, we can determine external node numberings for each node of the tree in constant time Sub-Block Pointers We know that inter-sub-block pointers need only be of size lg lg n. Moreover, the number of inter-subblock pointers in a given sub-block is one less than the number of sub-blocks in the block. To store the pointers we number the external nodes in postorder and as previously mentioned, maintain a count of the number of external nodes before a sub-block. We store the list of a block's inter-sub--block pointers in a B-tree that is in fact a simplified version of a fusion tree[4] with the following properties: 1. Each node of the B-tree is of size ½ lg n bits. 2. The keys of the tree are of size lglg n bits and represent the external node's number in the block. 3. The ½1gn bit nodes each store from ~ 41glgn to 21glgn of these lg lg n sized keys. 4. At all times the tree is of height 2. We know that a B-tree with these properties will remain height 2 with up to lg 2 n/(4 lg lg n) 2 > lg n keys. 531

Since we have O(lg n) external nodes that are inter-subblock pointers in each block, we say that an external node is an inter-sub-block pointer if it is represented by a key in the tree.

4 Since we have O(lg n) external nodes that are inter-subblock pointers in each block, we say that an external node is an inter-sub-block pointer if it is represented by a key in the tree. Assuming that the external node is found in the tree, we now must find the pointer associated with the desired position. To allow this to be done in constant time we store an additional B-tree node for each of the nodes on the second level of the B-tree which instead of storing keys, stores the intersub-block-pointers (we call this node the pointer node). This can be done since our keys and pointers are of the same size. When we find that a key is in a given second level node of the B-tree and at a given position, we go to the same position in the pointer node and we will find the pointer associated with that external node. With nodes containing up to lgn/21glgn keys, searching for a desired node could take O(lglgn) time with binary search. Instead, we maintain a table that will allow us to achieve the constant time bound. We store a table outside of the block structure which is indexed by a ½ lg n sized key. Since there exists an entry of the table for each of the ½ lg n sized keys, there are vz~ entries in total. Each entry stores in sequential order lg n/lglg n records of size lglgn. When we wish to know which branch to take on the top level of our B- tree, or if a key exists on the second level, we index into the table based on the B-tree node's bit representation (note we choose the table's index to be the same size as each node). Having found the correct entry in the table, we use the lg lg n sized external node number to determine how many records into the entry we must skip (i.e. external node ~ in the block accesses the th record in the list). The record to which you skip encodes the branch of the tree you must traverse or, if on the second level, where your corresponding key should reside. This table will allow us to index into the tree in constant time and hence we will be able to determine whether an external node represents an inter-sub-block pointer in constant time. Lemma 2.2 (Inter-Sub-Block Pointers) O(n/lglgn) bits su~ce to represent all inter-subblock pointers. Proof. In each block we have O(lgn) inter-subblock pointers each of size lglg n. The structure that holds the inter-sub-block pointers consists of one root node and lg n/lg lg n external nodes each of size ½ lg n. Additionally, the actual lg lg n sized pointers are stored in a second external node level containing lgn/lg lg n leaf nodes each of size ½ lg n. Therefore in total we have lg 2 n~ lg lg n + ½ lg n bits used up in storing a block's inter-sub-block pointers. Additionally, outside the block structure, there is a table of vrff entries each of size O(lgn) giving us O(~/~lgn) extra space. Since there are O(n/lg z n) blocks there will be n~ lglg n+n/2 lgn+ O(v/~lgn), or more simply O(n/lglgn), bits used in storing the structure's inter-sub-block pointers. [] Internal Sub-block Organization The internal organization of blocks is crucial to achieving quick accesses and updates. The sub-blocks within a block are arranged similarly to how we store blocks. Sub-blocks are stored in sub-block groupings such that all sub-blocks in a sub-block grouping are the same size =t=lglgn. The sub-block groupings are arranged such that the grouping that contains the sub-blocks of smallest size are first and the largest sub-block sized grouping is last. Between sub-block groupings we maintain a gap of at most lg n bits so that when sub-blocks grow upon insertions they can be rearranged easily. Since there are O(lg n/lg lg n) subblock groupings and there are at most lg n bits between each pair of groupings, the total space used in inter-subblock grouping gaps is lg 2 n~ lg lg n. 2.3 Data Representation The data representation technique used in our model is key to achieving the space constraints desired. It is clear that since, pointers to data dominate the numbers of overall pointers, if we had explicit pointers from the external nodes of the tree to the data set then we would require O(n lgn) bits of additional space which would push us beyond the desired 2n space bound. As a result we implement a storage technique for the data that mirrors the structural storage technique. In our model we consider the case in which data records are of constant size (the case where data records are lg n bit references is analogous). We store these fixed size records in blocks of size i lg n (i is at most ~ lg n). Like our structural blocks, the data blocks are stored in memory according to their size with all blocks of the same size stored contiguously. Within these blocks we contiguously store our fixed sized data records. Additionally, as with previously defined structural blocks, we maintain gaps between groupings of same sized blocks so that shuffling of blocks can be done efficiently. In each structural block is a pointer to the data block that stores its data. Additionally we know that if an external node of the subtree is not an inter-sub-block or inter-block pointer, then it is external to the entire tree and as such is associated with some data. After verifying that the leaf is external to the tree we can find its data in the block's corresponding data block. To simplify searching for the data within the data block, we use the previously described external node numberings. To determine which record to access in 532

5 the data block we must know precisely the number of external pointers that precede us within the block and sub--block. These values can be found in the respective B-trees and when subtracted from the external node numbering, give the data record that is to be accessed. It should be noted that if it is desired that the tree's internal nodes have associated data then an analogous method can be used to represent the internal data. 3 Modifying the Structure When an insertion is made into a block its size increases, however, provided it does not increase beyond the lgn block increment it remains valid in its current location. If however, the block size has increased beyond the lgn block increment, it must be relocated. 3.1 Relocating Blocks When a block grows we know that relocation will result in the block being placed in the next largest block grouping (if one is not available the block must be split). To do this a copy of the block is made and then the last block of the block grouping is moved into the moving block's location (we know the last block will fit since all blocks in a block grouping are of the same size). This move increases the gap between the two block groupings by the old block size. After the gap grows, its size will be large enough so that the moving block can be placed within. Resultantly, the moving block becomes the first block of the following block grouping. Figure 1 shows how blocks are relocated. Lemma 3.1 (Block Relocation) located in O(lg n) time. A block can be re- 3.2 Shrinking Gap Sizes We can see that in the process of relocating blocks upon the addition of a new node, gap sizes decrease. From this we can deduce that there will be a situation where a given gap will close (i.e. be reduced to 0 bits in size) and the above described block relocation algorithm will fail. To avoid gaps closure we monitor the gaps and when a gap is reduced to lg n bits we perform gap resizing Gap Resizing To resize the gap between block grouping a and b (where block size(a) < block size(b)) we make a copy of the first block in block grouping b (bl) and delete bl so that its previous location is now a gap. Then we make a copy of the first block of block grouping b+l ((b+ 1)1), and place bl at the end of block b. After this has been done, the gap between a and b is now sizeof(b) (or at least lg 2 n). We continue copying and moving blocks until the first block of the last block grouping has been moved to the end of the array and the array is extended if necessary (see figure 2). This will guarantee that block size will always be between 2 lg n and 3 lg 2 n. Finally we must update block pointers to maintain the tree's structure. A similar mechanism is implemented to avoid gap sizes from becoming too large upon deletions. The details of this complimentary mechanism are similar to the gap resizing algorithm above and as such are left to the reader. 3.3 Splitting Blocks Another implication of having block size constraints in the presence of insertions is that block sizes could exceed the maximum allowable size. In this case we must split the block so that the two resultant blocks are both of a legal size. Splitting a block is thus performed in three steps: finding a node at which the block can be validly split, splitting the block, and placing the resulting two blocks in their proper places. We will assume for now we can perform a block split in constant time by dereferencing one pointer to disconnect the two trees. Later we will show that in fact O(lg 2 n) amount of work may be necessary if a sub-block is split as well. Since we have shown that block reallocation takes O(lg n) time, to show that block splitting requires O(lg 2 n) time we must show that we will always be able to find a valid splitting node in O(lg 2 n) time. We can see that if we can split the block such that both parts have between 1/3 and 2/3 of the bits we will be left with 2 blocks both of which are now validly sized. It is known from [6] that it is possible to split a binary tree of n nodes through removal of an edge so that each subtree has no more that 2n/3 vertices. This leads to the following claim: Claim 3.1 A block split can be performed in O(lg 2 n) time. We can see that splitting a block requires O(lg 2 n) time if and only if at the sub-block level, all splitting operations can be completed in O(lg 2 n) time or less. In the next section we go on to describe the sub-block structure and eventually show that the above claim is true. In the process, we prove Theorem 1.1; that insertions and deletions can be done in O(lg 2 n) time. 3.4 Inserting into Sub-blocks When a new node is inserted into the structure, the actual insertion takes place at the sub-block level. After the correct block and sub-block are found, the sub-block is traversed to find the location where the new node will be placed. Once the location where the new node is to 533

6 (a) (1~) (c) Figure 1: (Relocating Blocks) In (a) block 2.2 is the destination of a new node. With the new node block 2.2 becomes too large and must move. In (b) the block is copied to a new location and the last block of the block grouping takes its place. Finally in (c), the now larger 2.2 is placed at the top of the next block grouping and the gap between the two block groupings decreases by lg n. be placed is found, the insertion can take place. An insertion requires us to change the sub-block pointer so that it points to the new representation of the tree. We know that this new representation will be in the table representing trees of size r + 1. Accordingly, after generating the new representation of the tree, by modifying the encoding to account for the insertion, we simply perform a binary search of the r + 1 size table to find the offset of the correct encoding. Once we have found the correct encoding, we set that to be the new offset and we set r + 1 to be the new size. This takes O(lg n) time as the tables are of size n ~. When we add a node to a sub-block, its size increases by two bits. As a result, we may have to move the sub-block to the next sub-block grouping (since subblock sizes are in increments of lg lg n). This sub-block reorganization is similar to the block-reorganization previously described. Additionally, since we assume that insertions may occur at the leaf level, the external node's data must be added to the tree structure. This is done by inserting the leaf's record into the external data structure defined above. Finally, the increase in leaves forces us to increment external node numberings (we omit these details). Lemma 3.2 Modifying the tree structure upon insertion of a node requires O(lgn) time Splitting Sub-blocks When an insertion is made into a sub-block of largest allowable size, the sub-block must be split. This can be done using the three step method outlined for block splitting. The actual splitting of a sub-block is a bit more complicated since we do not have explicit pointers to simply reassign. To perform the split we first determine the node at which the sub-tree will be split. After the node has been found, we split the sub-tree and generate the encodings for the two new sub-trees. To find the tree representations for the two new trees we first must determine the size of the trees and then search the corresponding tables for the representation's offset. Following the search of the appropriate tables, we determine the external node numberings by splitting the original external node numberings. Since we now have a new sub-block we must add one inter-sub-block pointer to the B-tree where the pointers are stored. This is the final stage in the splitting process. Lemma 3.3 A sub-block can be split in O(lgn) worst case time and O(1) amortized time. Lemma 3.4 A block split can be performed in O(lg 2 n) time and O(1) amortized time. 534

7 .. (.1 (b) (c) (d} Figure 2: (Gap Resizing) In (a) block 1.2 grows and is relocated. This leaves the gap between block grouping 1 and 2 at lg n bits and so the gap must be resized. Block 1.2 moves to the bottom of the second block grouping and block 3.0 moves to the bottom of the third block grouping. This continues until (c) where block (b-1).o is moved and b.o is moved to the bottom of the array of blocks. Notice that in (d) the array of blocks has grown by the size of the last block. 3.5 Modifying Data When an insertion occurs, the data associated with the given node must be added to the data structure. To do this we first determine (by examining the external node numberings), the location within the data block where the data record belongs. Then, we locate the data block through its pointer in the block and rewrite the block with the inserted record in place. When our records are of constant size, rewriting a data block will take O(lgn) amortized time. Conversely, when the data records are lg n bit references we will require O(lg z n) amortized time to rewrite blocks. Following the insertion the block is larger and must be relocated. This relocation is performed as described previously when dealing with structural block relocation. Lemma3.5 Modifying data blocks takes O(lgn) amortized time for fixed sized records and O(lg2n) amortized time for lg n sized references. This is the dominant cost in modifying our structure and proves Theorem Changes in lgn One difficulty with our model is that it has a reliance on the value of lg n however, this value has the potential to change over time. As a result there are steps that must be taken when the value of lg n increases or decreases. We know that before lg n doubles or halves, n must be squared or rooted. This means that we can amortize the cost of changing the structure over O(n 2) operations. 3.7 Subtree Size Since we divide our tree structure into small blocks, if we maintain at the root of the small block the block's subtree size, in the worst case updating subtree size will require visiting each of these small blocks. Additionally, if we maintain the subtree size within the small block at the root of each tiny block, when updating, these values must also be modified. While performing these updates would take o(n) time, we would like to have the ability to update subtree size in constant time. To achieve this we consider a certain class of accesses to the tree. When concerned with subtree size we say that navigation through the tree begins at the root and may end at any point in the tree (although for purposes of claims regarding worst case time for updating the size of subtrees, we will assume navigation ends at the root). Each small block contains a count of its subtree 535

size beginning at its root and ending at its leaves. Additionally, each tiny block in the structure maintains a count of the number of nodes in its subtree starting at its root.

8 size beginning at its root and ending at its leaves. Additionally, each tiny block in the structure maintains a count of the number of nodes in its subtree starting at its root. Finally, in the table of tree representations, we store the number of descendant nodes within the tiny block for each node of the tree. When we wish to determine the subtree size at a given node of the tree we take the value in the table and we add to it the subtree sizes of each of its descendant tiny blocks. Then we add to that number the subtree sizes of each of its descending small blocks. Since in the worst case there are O(lg 2 n) descendant small blocks and O(lgn) descendant tiny blocks, determining subtree size takes O(lg 2 n) time. When an insertion or deletion takes place, we must update the subtree size of each of the tiny blocks in the current small block. This update will potentially require visiting all the tiny blocks within the small block and accordingly will take O(lg n) time. After this operation the small block has correct current subtree size information however, all ancestor small blocks may have incorrect information. To correct this we must visit all the small blocks which we traversed to get to the node at which the insertion or deletion took place. Since we already have visited these small blocks we can amortize the cost of updating the subtree size over the steps taken in traversing to the current node. This allows us to achieve amortized constant time updates to subtree size. It should be noted that we could compute the subtree size in constant time by maintaining the subtree size sums of descendant blocks (i.e. each block maintains the sum of all preceding blocks subtree size). With these sums we could simply determine the first descendant block and the last descendant block and from these, determine the subtree size. The problem with this model is that updating the sums would take O(lg 2 n) time. Corollary 3.1 The results of Theorem 1.1 apply to a forest of binary trees with the added result that two trees in the forest can be joined in O(lg 2 n) time. Corollary 3.2 Updates can performed on a binary tree in amortized constant time with the use of O ( n lg lg n ) space. Corollary 3.2 follows by storing the small trees in a conventional manner with explicit parent-child pointers and explicit pointers to the data, though these pointers require only lglgn bits each. A few other minor modifications are required, but we omit these details. ically optimal number of bits. Additionally we have shown how our structure, unlike the ones previously proposed facilitates insertions and deletions in O(lg 2 n) time. It would be interesting to consider the problem of improving the time for an update. While our model can represent arbitrary k-degree ordinal trees, it does so by performing a trivial mapping which requires O(k) time to determine the kth child of any of the tree's nodes. The problem of succinctly representing, and efficiently updating, trees of higher degree so that navigation can be performed efficiently [1] remains open. It may well be amenable to the our techniques. References [1] D. Benoit, E. D. Demaine, J. I. Munro, and V. Raman, "Representing Trees of Higher Degree", In Proceedings of the 6th International Workshop on Algorithms and Data Structures (WADS), volume 1663 of LNCS, pages , Springer-Verlag, [2] A. Brodnik, S. Carlsson, E. D. Demaine, J. I. Munro, and R. Sedgewick, "Resizable Arrays in Optimal Time and Space", In Proceedings of the 6th International Workshop on Algorithms and Data Structures (WADS), volume 1663 of LNCS, Springer-Verlag, (1999) [3] D. R. Clark and J. I. Munro, "Efficient Suffix Trees on Secondary Storage", Proceedings of the 7th ACM- SIAM Symposium on Discrete Algorithms (SODA), (1996) [4] M. L. Fredman, and D. E. WiUard, "Surpassing the Information Theoretic Bound with Fusion Trees", Journal of Computer and System Sciences, 43 (1993) [5] G. Jacobson, "Space-efficient Static Trees and Graphs", Proceedings of the IEEE Symposium on the Foundations of Computer Science (FOCS) (1989) [6] R. J. Lipton, and R. E. Tarjan, "A Separator Theorem For Planar Graphs", SIAM Journal of Applied Mathematics, 36(2) (1979) [7] J. I. Munro and V. Raman, "Succinct Representations of Balanced Parentheses, Static Trees and Planar Graphs", In Proceedings of the 38th Annum Symposium on Foundations off Computer Science (FOCS), (1997) [8] A. J. Storm, Representing Dynamic Binary Trees Succinctly, MMath thesis, U. Waterloo, Conclusions We have presented a binary tree representation that is within a lower order term of the information theoret- 536

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see SIAM J. COMPUT. Vol. 15, No. 4, November 1986 (C) 1986 Society for Industrial and Applied Mathematics OO6 HEAPS ON HEAPS* GASTON H. GONNET" AND J. IAN MUNRO," Abstract. As part of a study of the general