I/O Efficient Dynamic Data Structures for Longest Prefix Queries

Size: px

Start display at page:

Download "I/O Efficient Dynamic Data Structures for Longest Prefix Queries"

Andrew Taylor
6 years ago
Views:

1 I/O Efficient Dynmic Dt Structures for Longest Prefix Queries Moshe Hershcovitch 1 nd Him Kpln 2 1 Fculty of Electricl Engineering, moshik1@gmil.com 2 School of Computer Science, himk@cs.tu.c.il, Tel Aviv University, Tel Aviv 69978, Isrel Astrct. We present n efficient dt structure for finding the longest prefix of query string p in dynmic dtse of strings. When the strings re IP-ddresses then this is the IP-lookup prolem. Our dt structure is I/O efficient. It supports query with string p using O(log B (n) + p ) I/O opertions, where B is the size of disk lock. B It lso supports n insertion nd deletion of string p with the sme numer of I/O s. The size of the dt structure is liner in the size of the dtse nd the running time of ech opertion is O(log(n) + p ). 1 Introduction We consider the longest prefix prolem which is defined s follows. The input consists of set of strings S = {p 1...p n } which we shll refer to s prefixes. We wnt to preprocess S into dt structure such tht given query string q we cn efficiently find the longest prefix in S which is prefix of q or report tht no prefix in S is prefix of q. We focus on the dynmic version of the prolem where we wnt to e le to insert nd delete prefixes to nd from S, respectively. The min ppliction of this prolem is for pcket forwrding in IP networks. In this ppliction router mintins set of prefixes of IP ddresses ccording to some routing protocol such s BGP (Border Gtewy Protocol). When pcket rrives, the router finds the longest prefix of the destintion ddress of the pcket nd sends the pcket on the outgoing link ssocited with this longest prefix. A longest prefix mtch query in this settings is often clled IP-lookup. The rpid growth of the Internet hs rought the need for routers to mintin lrge sets of prefixes, nd to perform longest prefix mtch queries t high speeds [7]. A min issue in the design of routers is the size of the expensive high speed memory used y the router for pcket forwrding. One cn reduce the size of this expensive memory y using externl memory components. Therefore the I/O efficiency of the lgorithm we use is very importnt. In softwre implementtions the entire dt structure my not fit into the cche nd we my flip prts of it ck nd forth from the min memory or the disk. In hrdwre implementtions it my e too expensive to fit the entire dt structure in the memory which is This work is prtilly supported y United sttes - Isrel Bintionl Science Foundtion, project numer

2 integrted in the chip tht implements the forwrding lgorithm itself. So prts of the dt structure my reside in externl memory devices (such s DRAM). In such designs the communiction etween the min chip nd the externl memory ecomes the performnce ottleneck of the system. In ddition to good I/O performnce n efficient dt structure for IP-lookup should e le to perform queries t the line rte (which is out 40 Gps nd more tody). The dt structure hs to e sclle since the numer of prefixes router hs to mintin is growing rpidly s well s the length of these prefixes. Finlly, we need to support fst updtes minly due to instilities in ckone routing protocols nd security issues. Our computtionl model. Our lgorithm works in the clssicl pointer mchine model[19] using only comprisons to mnipulte prefixes. This is in contrst with mny other lgorithms for IP-lookup tht use it mnipultions on IP ddresses. See Section 1. This restriction which we oey does not come t the cost of complicted dt structure. On the contrry, our dt structure is much simpler thn previously known dt structure with the sme gurntees. We develop our dt structures in two steps. First we reduce the longest prefix prolem to the prolem of finding the shortest segment contining query point. For this dt structure we ssume tht ech prefix fits into constnt numer of computer words so tht we cn compre two endpoints of segments in O(1) time. In the second step we relx this ssumption nd del with vrile length strings with no priori upper ound on their length. We ssume tht the strings re over n ordered lphet Σ nd tht we cn compre two chrcters of Σ in O(1) time. We do not require direct ccess to the chrcters of ech prefix. To nlyze I/O performnce we use the stndrd externl memory model where memory is prtitioned into locks of size B, nd we count the numer of locks tht we hve to trnsfer from slow to fst memory in order to perform the opertion. This quntity is the numer of I/O opertions performed y the opertion.[20]. Overview of our results. We first consider the prolem of mintining dynmic set of segments for point sting queries. Specificlly, we consider dynmic nested fmily of segments where ech pir of segments re either disjoint or one contins the other. We develop dt structure tht given point p cn efficiently find the shortest segment contining p. Our dt structure for this prolem which is sed on segment tree (with lrge fn-out) is prticulrly simple. In segment tree we mp segment s to every node v, such tht s contins 3 v, nd does not contin the prent of v. Typiclly one mintins t ech node v ll the segments tht mp to v in some secondry structure [18, 13, 12]. We mke the crucil oservtion tht for our point sting query it is sufficient to mintin for ech node v only the shortest segment which mps to v. Our dt structure performs query, nd n insertion or deletion of segment in O(log n) time nd O(log B (n)) I/O opertions. We mnipulte the segments only vi comprisons of their endpoints. 3 A segment contins node if it contins ll points in its sutree.

3 We use this result to solve the longest prefix prolem s follows. We ssocite segment [pl,pr] with ech string p, where L nd R re two new chrcters, smller nd lrger thn ll other chrcters, respectively. We then pply the previous dt structure to this set of segments. This gives the dt structure for the cse where we ssume tht two prefixes cn e compred in O(1) time. To hndle strings with no fixed ound on their lengths we comine this ide with the powerful string B-tree of Ferrgin nd Grossi [9]. This dt structure is B-tree crefully designed for storing strings. For efficient serches nd updtes it uses Ptrici trie [14] in ech node. We show how to mintin the informtion which we need for longest prefix queries using the string B-tree. Since our dt structure is sed on B-tree it is lso I/O efficient. If we pick the size of node so tht it fits in disk lock of size B, we otin tht query or updte with string q performs O(log B (n)+ q /B) I/O opertions. The dt structure requires O(n/B) disk locks in ddition to the locks required to store the strings themselves. The time for query or updte with string q is O(log(n) + q ). (This is s efficient s with tries implemented crefully [16], ut tries cnnot e implemented I/O efficiently [6]). Previous relted results. There hs een lot of work minly in the networking community on the IP-lookup prolem. The different dt structures cn e clssified into three fmilies: trie sed structures (See for exmple [7] nd the references there), hsh sed structures (See for exmple [11] nd the references there), nd tree sed structures. In the rest of this section we focus on dynmic tree sed solutions with worst cse gurntees tht re relted to our pproch. Shni nd Kim [15] descrie solution sed on collection of red-lck trees tht requires liner spce nd logrithmic time per opertion. Feldmnn nd Muthukrishnn [8] proposed Ft Inverted Segment tree (FIS). This dt structure supports queries in O(log log n + l) time, where l is the numer of levels in the segment tree. The spce requirement is O(n 1+1/l ), nd insert nd delete tke O(n 1/l log n) time, ut there is n upper ound on the totl numer of insertions nd deletions llowed. Suri et l. [18] proposed dt structure which is similr to ours in the sense tht it is oth segment tree nd B-tree. But they store in ech node ll the segments which re mpped to it nd therefore chieve logrithmic worst cse time ound per opertion nd liner spce only for IP-ddresses. Lu nd Shni [13] suggested n improvement of the segment tree of Suri tht stores ech prefix only in one plce. They mintin other it vectors in internl nodes nd their updte opertions re quite complicted. Our structure, which cn e extended to generl strings (nd generl segments) nd uses only comprisons, is simpler thn ll the solutions mentioned ove. In prticulr it is much clener thn the ltter two B-tree sed implementtions when pplied to IP ddresses. Kpln, Mold, nd Trjn [12] considered the prolem of point sting dynmic nested set of segments. In their setting, which is more generl thn ours, ech segment hs priority ssocited with it nd we wnt to find the segment of minimum priority contining query point. They present dt structure performing query nd updte in O(log n) time tht requires liner spce. It uses oth lnced serch tree nd dynmic tree [17] nd therey

4 more complicted thn ours (when pplied to the specil cse where the priority of n intervl is its length). In recent yers, externl memory dt structures hve een developed for wide rnge of pplictions [20]. A clssicl I/O efficient dt structure is the B-tree [2]. This is serch tree in which we choose the degree of node so tht it occupies single lock. The string B-tree of Ferrgin nd Grossi [9] is fundmentl extension of the B-tree for storing unounded strings. The min ide is to use Ptrici trie [14] in ech node to direct the serch. Unfortuntely this dt structure y itself does not solve the longest prefix prolem. Agrwl, Arge, nd Yi [1] improved more generl dt structure of Kpln, Mold, nd Trjn [12] for sting-min queries ginst generl segments (not necessrily nested). This dt structure is sed on B-tree nd cn e implemented so tht it is I/O efficient. Specificlly, dt structure for n intervls uses O(n/B) disk locks nd O(log B (n)) I/O opertions for query nd updte. This dt structure is quite complicted nd ssumes tht endpoints of intervls cn e compred in O(1) time. Therefore it is not directly pplicle for longest prefix queries in collection of unounded strings. Brodl nd Fgererg [5] otined cche olivious (see Section 4) dt structure for mnipulting strings. This dt structure, which is essentilly trie cn e used to otin n I/O efficient (though complicted) solution for the sttic version of the longest prefix prolem. The outline of the rest of the pper is s follows. In section 2 we present our sic ides using the ssumption tht strings re of constnt size. In section 3 we comine our ides with the string B-tree to otin generl I/O efficient solution. In section 4 we suggest future reserch. 2 B Tree for Longest Prefix Queries Our input is set of prefixes S = {p 1...p n } which re strings over the lphet Σ. We think of ech prefix p s segment I(p) = [pl,pr], where L nd R re two specil chrcters not in Σ, L is smller thn ll chrcters in Σ nd R is lrger thn ll chrcters in Σ. The longest prefix p of query q corresponds to the the shortest segment I(p) contining q. We descrie dynmic dt structure to mintin set of nested segments such tht we cn find the smllest segment contining query point. Although we cn pply our dt structure to ny set of nested segments we present our result using the string terminology nd the set of segments {I(p) p S}. Let P = {p i L, p i R p i S} e the set of endpoints of the prefixes in S. We store P ordered lexicogrphiclly t the leves of B + tree T. Ech internl node x of T hs n(x) children, where n(x) 2. If x is lef then it stores n(x) endpoints of P, where n(x) 2. (See Figure 1.) To simplify the presenttion we ssume tht lef x with n(x) endpoints hs n(x) + 1 dummy children. From now on when we sy lef of T, we refer to one of these dummy nodes, nd we refer to x s height-1 node. Ech endpoint in height-1 node plys the role of key seprting two consecutive

5 A 1 A 3 L A 6 R A 4 R A 2 R A 2 A 3 L A 5 L A 5 R A 6 R A 4 R A 7 R A 8 L A 8 R A 3 R A 2 R A 4 A 3 A 3 A 3 L A 4 L A 5 L A 5 R A 6 L A 6 R A 4 R A 7 L A 7 R A 8 L A 8 R A 3 R A 2 R A 3 A 4 A 5 A 6 A 7 A 8 A 2 A 5 A 6 A 7 A 8 A 4 A 3 A 2 A 1 Fig. 1. A B + tree with = 2 storing the prefixes A 1,..., A 8. Rectngles correspond to internl nodes nd squres correspond to dummy leves. In ech height-1 node we show the endpoints tht it stores. In ech internl node v of height > 1 we show the spns of its children which re lso used s the keys which direct the serch. (Note tht when we use the spns s keys, serch cn never rech the first dummy lef in ech height-1 node. Therefore we do not need to keep longest prefixes of these nodes nd we do not show them in the figure.) The spn of child u of v is the closed intervl from the point depicted to the left of the edge from v to u to the point depicted to the right of the edge from v to u. On ech edge (p(v), v) we show the longest prefix of v. dummy leves. We ssocite ech lef with the open intervl from the endpoint preceding the lef to the endpoint following the lef. We cll this intervl the spn of the lef. (Note tht the lst dummy lef in height-1 node nd the first dummy lef in the next height-1 node hve the sme spn.) We define the spn of n internl node v to e the smllest intervl contining the endpoints which re descendents of v. 4 We denote the spn of node v y spn(v). We think of T s segment tree nd mp ech segment [pl,pr] to every node v such tht pl is to the left of spn(v) nd pr is to the right of spn(v), nd either pl or pr re in spn(p(v)). We define the longest prefix (or the shorter segment) of v nd denote it LP(v) to e the shortest segment mpped to v. 5 If there isn t ny segment which is mpped to v we define LP(v) to e empty. (LP(v) is lso defined if v is dummy lef.) 4 This is the intervl tht strts t the leftmost endpoint in the sutree of v, nd ends t the rightmost endpoint in the sutree of v. 5 Note tht ll segments mpped to v form nested fmily of segments, so the shortest mong them is unique.

6 We store spn(v) nd LP(v) with the pointer to node v. Note tht when we re t node v we cn use the spn vlues of the children of v s the keys which direct the serch. We denote y B = O() the mximum size of node. We pick so tht B is the size of disk lock. 2.1 Finding the longest prefix Assume we wnt to find the longest prefix of query string q. We serch the B + tree with the string q 6 in stndrd wy nd trverse pth A to lef of T. We return LP(w), where w is the lst node on A, such tht LP(w) is not empty. The correctness of the query follows from the following oservtions. For ech prefix p of q, I(p) is mpped to some node u on A. Therefore p is LP(u) unless some longer prefix is mpped to u. Furthermore, since for every v, LP(p(v)) is prefix of LP(v), it follows tht the longest prefix of q must e LP(w), where w is the lst node on A for which LP(v) is not empty. 2.2 Inserting new prefix To insert new prefix p we hve to insert I(p) into T. We insert pl nd pr into the pproprite height-1 nodes w nd w, respectively, ccording to the lexicogrphic order of the endpoints. The endpoint pl is inserted into the spn of lef y nd the endpoint pr is inserted into the spn of lef z. Assume first tht z y. The spn of y is now split etween two new leves: y tht precedes pl nd y tht follows pl. We set the longest prefix of y to e the longest prefix of y nd the longest prefix of y to e p. The spn of z is now split etween two new leves: z tht precedes pr nd z tht follows pr. We set the longest prefixes of z to e p nd the longest prefix of z to e the longest prefix of z. If y = z then the spn of y is split etween three new leves y 1, y 2, nd y 3. We set the longest prefixes of y 1 nd y 3 to e the longest prefix of y, nd the longest prefix of y 2 to e p. There my e nodes v in T, such tht fter dding p, we hve to updte LP(v) to e p. Let y e the lef preceding pl nd let z e the lef following pr. Let u e the lowest common ncestor of y nd z. Let u e the child of u which is n ncestor of y nd let u e the child of u which is n ncestor of z. We my need to updte LP(v) if either: Cse 1: v is child of node w on the pth from u to y, which is right siling of the child w of w on this pth. Cse 2: v is child of node w on the pth from u to z, which is left siling of the child w of w on this pth. Cse 3: v is child of u which is right siling of u nd left siling of u. 6 In fct we pretend to serch with string (not in the dt structure) tht immeditely follows ql in the lexicogrphic order of the strings.

7 For ech such node v, we know tht spn(v) I(p) so we chnge LP(v) to e p, if p is longer thn the current LP(v). Since the depth of T is O(log B (n)), we updte O(B log B (n)) longest prefixes which re stored t O(log B (n)) nodes. After inserting pl nd pr, if the height-1 node contining pl nd the height-1 node contining pr hve no more thn 2 children, we finish the insert. Otherwise we hve to split t lest one of these nodes. We split node v into two nodes v 1 nd v 2. Node v 1 is the prent of the first (or +1) children of v nd node v 2 is the prent of the lst +1 children of v. Both v 1 nd v 2 replce v s consecutive children of p(v). We compute spn(v 1 ) from the spn of its first child nd the spn of its lst child, nd similrly for spn(v 2 ). A 1 v A 3 A 2 A 2 A 5 A 2 A 1 v 1 v 2 A 3 A 2 A 5 u 1 u 2 u 3 u 4 u 5 A 3 A 5 A 2 A 1 u 1 u 2 u 3 u 4 u 5 A 3 A 5 A 2 A 1 Fig. 2. A node v t the left which is split into nodes v 1 nd v 2 to the right. Since spn(v 1) I(A 2) the prefix A 2, which ws the longest prefix of u 2 efore the split, is the longest prefix of v 1 fter the split. The longest prefix of u 2 fter the split is empty. Clerly we hve to updte LP(v 1 ) nd LP(v 2 ). Furthermore, since segment tht ws mpped to child u of v my now e mpped to v 1 or v 2, we my lso hve to updte LP(u) for children u of v 1 nd v 2. Other longest prefixes do not chnge. The following simple oservtions specify how to updte the longest prefixes. In the following if u is child of v prior the split, then LP(u) refers to the longest prefix of u efore the split. Note tht since v exists only efore the split then LP(v) is the longest prefix of v efore the split. Similrly, LP(v 1 ) nd LP(v 2 ) re the longest prefix of v 1 nd v 2, respectively, fter the split. Lemm 1. Let u e child of v 1 fter the split. If spn(v 1 ) I(LP(u)) then fter split LP(u) should e empty. Proof. Let p = LP(u) since spn(v 1 ) I(LP(u)) then the segment I(p) is not mpped to u fter the split. Since I(p) ws the shortest segment tht ws mpped to u no other segment is mpped to u fter the split. Lemm 2. Let u 1 nd u 2 e children of v 1. If spn(v 1 ) I(LP(u 1 )) nd spn(v 1 ) I(LP(u 2 )) then LP(u 1 ) = LP(u 2 ). Proof. Since spn(v 1 ) I(LP(u 1 )) then spn(u 2 ) I(LP(u 1 )). So LP(u 1 ) cnnot e longer thn LP(u 2 ) since this would contrdict the fct tht I(LP(u 2 )) is the shortest segment contining spn(u 2 ). Symmetriclly, LP(u 2 ) cnnot e longer thn LP(u 1 ), so they must e equl.

8 Lemm 3. If there exist child u of v 1 such tht spn(v 1 ) I(LP(u)) then LP(v 1 ) is LP(u). Proof. Oviously LP(u) is mpped to v 1. Furthermore, LP(u) is the longest prefix with this property, since if there is longer prefix q with this property then q should hve een LP(u) efore the split. Lemm 4. If there isn t child u of v 1 such tht spn(v 1 ) I(LP(u)) then LP(v 1 ) is equl LP(v). Proof. We clim tht there exists child u of v 1 tht LP(u) is empty. From this clim the lemm follows since if there is prefix q longer thn LP(v) such tht I(q) is mpped to v 1, then q is mpped to u efore the split nd LP(u) couldn t hve een empty. We prove this clim s follows. Assume to the contrry tht LP(u) is not empty for every child u of v 1. Let w e child of v 1 such tht I(LP(w)) is not contined in I(LP(w )) for ny other child w of v (w exists since segments do not overlp). From our ssumption follows tht I(LP(w)) spn(v 1 ). Therefore t lest one of the endpoints of I(LP(w)), sy z is in the sutree of v 1. Let w e child of v 1 whose sutree contins z. It is esy to see now tht I(LP(w )) nd I(LP(w)) overlp which is contrdiction. A symmetric version of Lemms 1, 2, 3, nd 4 hold for v 2. These oservtions imply the following strightforwrd lgorithm to updte longest prefixes when we perform split. If there is child u of v 1 such tht spn(v 1 ) I(LP(u)) we set LP(v 1 ) to e LP(u), otherwise we set LP(v 1 ) to e LP(v). In ddition we set LP(u) to e empty for every child u of v 1 such tht spn(v 1 ) I(LP(u)). We updte the spn of v 2 nd its children nlogously. See Figure 2. After splitting v we recursively check if p(v 1 ) or p(v 2 ) hs more thn 2 children nd if so we continue to split them until we rech node tht hs no more thn 2 children. 2.3 Deleting prefix To delete prefix p we need to delete I(p) from T. We first find the longest prefix of p in S denoted y w. 7 Then we delete pl nd pr from the height-1 nodes contining them. We hve to chnge the longest prefix of every node v for which LP(v) = p to w. Nodes v for which LP(v) my e equl to p re of three kinds s specified in Cses (1), (2) nd (3) of Section 2.2 As result of deleting pl nd pr from the height-1 nodes contining them we my crete nodes with less thn children. To fix such node v we either orrow child from siling of v or merge v with one of its silings. We omit the detils of these relncing opertions nd their ffect on longest prefixes from this strct. The following theorem summrizes the results of this section. 7 We do tht y query with string (not in the dt structure) tht immeditely follows pr in the lexicogrphiclly order of strings.

9 Theorem 1. Assuming ech string occupies O(1) words, the B-tree dt structure which we descried supports longest prefix queries, insertions, nd deletions in O(log(n)) time. Furthermore, it performs O(log B (n)) I/Os per opertion, nd requires liner spce. 3 String B-tree For Longest Prefix Queries In B-tree, we ssume tht Θ() keys tht reside t single node fit into one disk lock of size B. However if the keys re strings of vrile sizes, which cn e ritrrily long, there my not e enough spce to store Θ() strings in single lock. Insted, we cn store Θ() pointers to strings in ech node, ut ccessing these strings during the serch requires more thn constnt numer of I/O opertions per node. To reduce the numer of I/Os, Ferrgin nd Grossi [9] developed n elegnt generliztion of B-tree clled the string B-tree or SB-tree for short. 0 P = cc 3 4 c mismtch c c c c c c c Correct Position 8 lef1 Common c prefix c Fig. 3. A Ptrici trie of node in string B-tree. The numer in node is its string depth. The chrcter on n edge is the rnching chrcter of the edge. An individul node v of n SB-tree is shown in Figure 3. Insted of storing the keys t node v we store Ptrici trie [14] of the keys, denoted y PT(v). Using this representtion we cn perform -wy rnching using only Θ() chrcters tht re stored in constnt numer of disk locks of size B. Ech internl node ξ of the Ptrici trie stores the length of the string corresponding the pth from the root to ξ. We cll this the string depth of ξ. We store with ech edge e the first chrcter of the string tht corresponds to e. This chrcter is clled the rnching chrcter of e. As n exmple Figure 3 shows Ptrici trie of node in string B-tree. The right child of the root hs string depth 4 nd it s outgoing edges hve the rnching chrcters nd, respectively. This mens tht the node s left sutrie consists of strings whose fifth chrcter is, nd its right sutrie consists of strings whose fifth chrcter is. The first four chrcters in ll

10 the strings in the right sutrie of the root re cc. Let ξ e node of the trie whose string depth is d(ξ). To mke rnching decision t ξ, we compre the d(ξ) + 1 chrcter of the string tht we serch, to the chrcters on the edges outgoing from ξ. For exmple, for the string cc, the serch in the trie in Figure 3 trverses the rightmost pth of the Ptrici trie, exmining the chrcters 1, 5, nd 7 of the string which we serch. Unfortuntely, the lef of the Ptrici trie tht we rech (in our exmple, the lef t the fr right, corresponding to cc ) is not in generl the correct rnching point, from the node of the SB-tree represented y this trie, since we did not compre ll chrcters of the string which we serch. We fix this y sequentilly compring the string which we serch with the key ssocited with the lef of the trie which we reched. If they differ, we find the position in which they first differ. In the exmple the first chrcter of the string cc tht is not equl to the corresponding chrcter of the key cc, is the fourth chrcter. Since the fourth chrcter of cc is smller we know tht the string which we serch is lexicogrphiclly smller thn ll keys in the right sutree of the root. It thus fits in etween the leves c nd cc. For more detils see [9]. Serching ech Ptrici trie requires constnt numer of I/O to lod it into memory, plus dditionl I/Os to do the sequentil scn of the key ssocited with the lef we reched. Therefore our structure s defined so fr does not gurntee tht the totl numer of I/Os is O(log B n + l/b), where l is the length of the string tht we serch. To further reduce the numer of I/Os Ferrgin nd Grossi [9] used the leftmost nd the rightmost strings in the sutree of node v s keys t p(v). Recll tht we in fct did the sme in our B-tree when we use the spns of the children of v s the keys t v. Hving the keys defined this wy, we cn use informtion from the serch in the trie PT(v) of node v to reduce the numer of I/Os in the followings serch of the trie PT(u) of child u of v. Specificlly, let s e the string which we serch, nd let l e the length of the longest common prefix of s nd the key t the lef of PT(v), where the serch ended. Then it is gurnteed tht the length of the longest common prefix of s nd the key t the lef of PT(u), where the serch of s ends is t lest l. Thus, we cn void the first l comprisons nd the I/Os ssocited with them. Ferrgin nd Grossi [9] lso showed how to insert nd delete string in O(log B n + l/b) time in the worst cse. We now descrie how to comine the SB-tree with our lgorithm for longest prefix queries so tht our input prefixes S = {p 1...p n } cn e ritrrily long. As Ferrgin nd Grossi [9], we use the endpoints of spn(v) s keys t p(v), nd represent the keys of ech node v in Ptrici trie PT(v). Ech lef of the Ptrici trie stores pointer to the first lock contining the key tht it corresponds to. We use the sme definition of the longest prefix of node v, denoted y LP(v), s in Section 2. Recll tht from these definitions follow tht if LP(v) is not empty then spn(v) I(LP(v)) nd therefore LP(v) is prefix of every key in the sutree of v. Let spn(v) = [KL(v),KR(v)]. Tht is KL(v)

11 e the leftmost string in the sutree of v nd KR(v) is the rightmost string in the sutree of v. Clerly LP(v) is prefix of KL(v) nd KR(v). The string KL(v) is key seprting v from its siling in p(v) nd therefore corresponds to lef in PT(v). So we represent LP(v) y storing its length, nd pointer to it, in the lef of PT(v), tht corresponds to KL(v). If LP(v) is empty we encode this y storing zero t the ssocited lef. Finding the longest prefix. We serch the SB-tree nd trverse pth A to lef of T. Let w e the lst node on A for which LP(w) is not empty. Together with the pointer to w in p(w), we find LP(w) nd pointer to LP(w). Inserting new prefix. Assume we wnt to insert new prefix p S to the dt structure. We insert pl nd pr into the SB-tree using the insertion lgorithm of the SB-tree. As in Section 2.2 there my e nodes v in T, such tht fter dding p, we need to updte LP(v) to e p. Nodes v for which LP(v) my e equl to p re of three kinds s specified in Cses (1), (2) nd (3) of Section 2.2. For ech such node v we chnge LP(v) to e p, if p is longer thn the current vlue LP(v). This is correct since for ech of these nodes v, we know tht spn(v) I(p). Note tht ll these chnges re locted t O(log B (n)) nodes of the SB-tree, nd therefore we cn perform them while doing O(log B (n)) I/O opertions. After inserting prefix p we my split node v into two nodes v 1 nd v 2. We split node in the SB-tree using the lgorithm of Ferrgin nd Grossi [9]. Splitting my chnge the longest prefixes. To perform these chnges we use the sme lgorithm s in Section 2.2. To implement this lgorithm we need to determine if there is child u of v 1 such tht spn(v 1 ) I(LP(u)). Let u e child of v 1. We decide if spn(v 1 ) I(LP(u)) s follows. Since LP(u) is prefix of KL(u) nd KR(u), nd KL(u) nd KR(u) re keys in PT(v 1 ) then there is pth in PT(v 1 ) tht corresponds to the string LP(u). It follows tht LP(u) is prefix of ll the keys in PT(v 1 ), nd in prticulr of KL(v 1 ) nd KR(v 2 ), if LP(u) is not lrger thn the string depth of the root of PT(v 1 ). We check if there is child u of v 2 tht spn(v 2 ) I(LP(u)) nlogously. Deletion of prefix is similr, we omit the detils from this strct. The following theorem summrizes the results of this section. Theorem 2. The dt structure which we descried in this section supports longest prefix queries, insertions, nd deletions in O(log(n) + q ) time where q is the string which we perform the opertion with. Furthermore, it performs O(log B (n) + q /B) I/Os per opertion, nd requires liner spce. 4 Future Reserch The cche olivious model [10] is generliztion of the I/O model. In this model we seek I/O efficient lgorithms which do not depend on the lock size. Among the stte of the rt in this model is cche-olivious B-tree [3], nd n lmost efficient cche-olivious string B-tree [4] whose query time is optiml

12 ut updtes re not. An ovious open question is to find cche olivious dt structure for longest prefix queries. References 1. P. K. Agrwl, L. Arge, nd K. Yi. An Optiml Dynmic Intervl Sting-Mx Dt Structure? In Proceedings of SODA, pges , R. Byer nd E. M. McCreight. Orgniztion nd Mintennce of Lrge Ordered Indexes. Act Informtic, 1(3): , M. A. Bender, E. Demine, nd M. Frch-Colton. Cche-Olivious B-Trees. SIAM Journl on Computing, 35(2): , M. A. Bender, M. Frch-Colton, nd B. C. Kusznul. Cche-Olivious String B-Trees. In Proceedings of PODS, pges , G. S. Brodl nd R. Fgererg. Cche-olivious string dictionries. In Proc. 17th Annul ACM-SIAM Symposium on Discrete Algorithms, pges , Erik D. Demine, John Icono, nd Stefn Lngermn. Worst-cse optiml tree lyout in memory hierrchy, W. Etherton, Z. Ditti, nd G. Vrghese. Tree Bitmp : Hrdwre/Softwre IP Lookups with Incrementl Updtes. ACM SIGCOMM Computer Communictions Review, 34(2):97 122, A. Feldmnn nd S. Muthukrishnn. Trdeoffs for Pcket Clssifiction. In Proceedings of INFOCOM, pges , P. Ferrgin nd R. Grossi. The String B-Tree: A New Dt Structure for String Serch in Externl Memory nd Its Applictions. Journl of the ACM, 46(2): , M. Frigo, C. E. Leiserson, H. Prokop, nd S. Rmchndrn. Cche-Olivious Algorithms. In Proceedings of FOCS, pges , J. Hsn, S. Cdmi, V. Jkkul, nd S. Chkrdhr. Chisel: A Storge-Efficient, Collision-Free Hsh- Bsed Network Processing Architecture. In Proceedings of ISCA, pges , My H. Kpln, E. Mold, nd R. E. Trjn. Dynmic Rectngulr Intersection with Priorities. In Proceedings of STOC, pges , H. Lu nd S. Shni. A B-Tree Dynmic Router-Tle Design. IEEE Trnsctions on Computers, 54(7): , D. R. Morrison. Ptrici: Prcticl Algorithm to Retrieve Informtion Coded in Alphnumeric. Journl of the ACM, 15(4): , S. Shni nd K. Kim. O(log n) Dynmic Pcket Routing. In Proceedings of ISCC, pges , D. Sletor nd R. E. Trjn. Self-djusting inry serch trees. Journl of the ACM, 32: , D. D. Sletor nd R. E. Trjn. A Dt Structure for Dynmic Trees. JCSS, 26(3): , S. Suri, G. Vrghese, nd P. Wrkhede. Multiwy Rnge Trees: Sclle IP Lookup with Fst Updtes. In Proceedings of GLOBECOM, pges , R. E. Trjn. A Clss of Algorithms which Require Nonliner Time to Mintin Disjoint Sets. Journl of Computing System Science, 18: , J. S. Vitter. Externl Memory Algorithms nd Dt Structures: Deling with Mssive Dt. ACM Computing Surveys, 33(2): , 2001.

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl