Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries
In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer id. Given query string q, query reports: the id of q if it exists in S nothing otherwise. Exmple Suppose tht S {,,,,,,, }. Let the ids of these strings e (from left to right) 1, 2,..., 8, respectively. Given q, query returns id 3, wheres given q, it returns nothing. Y. To, April 9, 2013 Tries
Think How is this prolem relted to inverted indexes nd serch engines? Y. To, April 9, 2013 Tries
Nottions nd A Nive Solution Let A e the lphet (i.e., every chrcter of ny string must come from A). s e the length of string s, i.e., the numer of chrcters in s. m S, i.e., the numer of strings in S. n the totl length of the strings in S, i.e., n s S s. When A is smll nd ll strings in S re short (e.g., s 10 for ll s S), the exct mtching prolem on strings cn e reduced to exct mtching on integers. For exmple, consider tht ech string s represents n English word, nd tht every s hs length t most 10. We cn mp s to n integer from 0 to 26 10 1. Think Why does the method no longer work if A is lrge or strings cn e ritrrily long? Y. To, April 9, 2013 Tries
Next, we will descrie nother solution sed on dt structure clled trie. First, let us define the concept of prefix. Let s e string of length t. We cn write its chrcters (from left to right) s s[1], s[2],..., s[t], respectively. Then, for ny i [1, t], the string formed y the sequence s[1],..., s[i] is clled prefix of s. Specilly, n empty string is lso prefix of s. Exmple s hs 6 prefixes:,,,,, nd. Let S e set of strings. We sy tht string s is possile prefix of S if s is prefix of t lest one string in S. Y. To, April 9, 2013 Tries
A set S of strings is clled prefix-free if no string in S is prefix of ny other string in S. Every set of strings cn e mde prefix-free y ppending specil termintion symol to ech string in S. Exmple Let S {,,,,,,, }. We cn convert S to S {,,,,,,, }, which is prefix-free. From now on, we will consider tht S is prefix-free, nd tht every string in S ends with. Y. To, April 9, 2013 Tries
Tries The trie on S is tree T defined s follows: Ech node u of T corresponds to distinct possile prefix of S. Let P(u) e the prefix tht u represents. Let u e node, nd v child node of u. Then: P(u) is prefix of P(v). P(v) P(u) + 1. Ech node u is leled with chrcter c, which is the lst chrcter of P(u). Y. To, April 9, 2013 Tries
Exmple: Let S {,,,,,,, }. The trie is: Note tht every -node u corresponds to distinct string s S. We therefore store the id of s t u. Y. To, April 9, 2013 Tries
Lemm The trie on S hs t most n nodes. Y. To, April 9, 2013 Tries
How do we nswer n exct mtching query with q? How out q? Y. To, April 9, 2013 Tries
How to delete the string? How out inserting? Y. To, April 9, 2013 Tries
Notice tht the efficiency of queries, insertions nd deletions depends on how well we cn solve the following prolem: Given node u nd chrcter σ A {}, how to find the child of v of u tht corresponds to σ? Different trdeoffs exist: By orgnizing the child nodes of u in n rry, we cn find v in O(1) time, ut the rry occupies O( A ) spce. By orgnizing the child nodes of u in inry serch tree (BST), we cn find v in O(log A ) time, nd the tree occupies O( f ) spce, where f is the numer of child nodes of u. Y. To, April 9, 2013 Tries
Theorem By using the rry implementtion, trie occupies O( A n) spce, nswers query with string q in O( q ) time, nd supports the insertion nd deletion of string s in O( A s ) time. By using the BST implementtion, trie occupies O(n) spce, nswers query with string q in O( q log A ) time, nd supports the insertion nd deletion of string s in O( s log A ) time. Y. To, April 9, 2013 Tries
Next, we will descrie nother trie vrint, clled lnced trie, which occupies O(n) spce, nd nswers query with string q in O(log m + q ) time. The trie, however, is sttic, nmely, it does not support insertions nd deletions. Y. To, April 9, 2013 Tries
From now on, we consider tht S is sorted lpheticlly (plcing efore ll chrcters of A). In generl, given set S of x sorted strings, we refer to the one in S whose rnk is x/2 s the medin of S. Exmple The medin of {,,,,,,, } is. Furthermore, given prefix p, denote y S(p) the set of strings in S with prefix p. Exmple Let S {,,,,,,, }. Then S() {,, }. Y. To, April 9, 2013 Tries
We lso need to define wht it mens y conctention. The conctention of two strings s 1 nd s 2 forms string y ppending the chrcters of s 2 t the end of s 1. Exmple If s 1 nd s 2, then conctention gives. If s 1 nd s 2, then conctention gives. Similrly, if s 1 nd s 2, conctention gives. Y. To, April 9, 2013 Tries
Let S e set of strings. The lnced trie on S is tree T defined s follows: Every node u in T corresponds to set S(u) of strings, nd crries lel L(u) nd positionl index I (u), which will e formlly defined elow. L(u) is the i-th chrcter of the medin of S(u), where i I (u). Ech u corresponds to possile prefix P(u) of S, where P(u) is the conctention of the lels of the nodes on the pth from the root to u. If u is the root, S(u) S, nd I (u) 1. u is lef if S(u) 1 nd I (u) s, where s is the (only) string in S(u). An internl u hs t most 3 child nodes u <, u, nd u > such tht: S(u <) is the set of strings in S(u) lpheticlly less thn P(u). I (u <) I (u). S(u ) is the set of strings in S(u) tht hve P(u) s their prefixes. I (u ) I (u) + 1. S(u >) is the set of remining strings in S(u). I (u >) I (u) Y. To, April 9, 2013 Tries
Exmple: Let S {,,,,,,, }. The lnced trie is: (, 1) (, 2) (, 3) (, 2) < (, 3) (, 4) (, 3) < < (, 4) (, 4) (, 5) (, 3) (, 4) > (, 6) (, 5) (, 5) (, 4) (, 5) < > (, 6) (, 5) (, 6) (, 5) (, 6) Ech node u is denoted in the form (L(u), I (u)). > (, 6) Y. To, April 9, 2013 Tries
(, 1) (, 2) (, 3) (, 2) < (, 3) (, 4) (, 3) < < (, 4) (, 4) (, 5) (, 3) (, 4) > (, 6) (, 5) (, 5) (, 4) (, 5) < > (, 6) (, 5) (, 6) (, 5) (, 6) > (, 6) How do we nswer n exct mtching query with q? How out q? Y. To, April 9, 2013 Tries
Theorem A lnced trie occupies O(n) spce, nd nswers query with string q in O(log m + q ) time Y. To, April 9, 2013 Tries