An introduction to suffix trees and indexing
|
|
- Bernadette Evans
- 6 years ago
- Views:
Transcription
1 An introduction to suffix trees and indexing Tomáš Flouri Solon P. Pissis Heidelberg Institute for Theoretical Studies December 3, 2012
2 1 Introduction Introduction 2 Basic Definitions Graph theory Alphabet and strings 3 Dictionaries Trie Patricia tree 4 Suffix tree Suffix trie Suffix tree Ukkonen s algorithm 5 Example 6 Overview
3 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
4 Introduction Introduction Two main problem areas in text retrieval 1 String matching 2 Indexing and querying
5 Introduction Introduction Two main problem areas in text retrieval 1 String matching 2 Indexing and querying Exact and approximate cases!
6 Introduction Exact string matching Many efficient algorithms exist Knuth-Morris-Pratt algorithm Boyer-Moore, Boyer-Moore-Horspool, Turbo-Boyer-Moore, etc. Aho-Corasick...
7 Introduction Indexing - 1 Problem Given a text T, we need to construct an efficient data structure D which will serve as an index of T, so that we can efficiently query text T. What do we expect from an efficient indexing data structure?
8 Introduction Indexing - 2 Given a query pattern P, we want to find all occurrences of P in preprocessed text T using the indexing data structure D The data structure D is efficient if It can be built in linear time in the size of T (O( T )) It occupies space linear in the size of T (O( T )) It can answer a query whether P exists in T in time linear in the size of P (O( P )) It can report all occurrences of P in T in time O( P +occ), where occ is the number of occurrences
9 Introduction Indexing - 2 Some efficient indexing data structures include Suffix automata (DAWG) and variations such as CDAWG Suffix trees Position heaps Suffix arrays In this lecture we will concentrate only on suffix trees
10 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
11 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V
12 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V
13 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V. Path A path of length n in a graph G = (V, E) is a sequence v 0, v 1,... v n V such that (v 0, v 1 ),(v 1, v 2 ),...,(v n 1, v n ) E
14 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V. Path A path of length n in a graph G = (V, E) is a sequence v 0, v 1,... v n V such that (v 0, v 1 ),(v 1, v 2 ),...,(v n 1, v n ) E
15 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V. Path A path of length n in a graph G = (V, E) is a sequence v 0, v 1,... v n V such that (v 0, v 1 ),(v 1, v 2 ),...,(v n 1, v n ) E. Cycle A path v 0, v 1,... v n, v 0, where n 2, is called a cycle
16 Graph theory Graph, Cycle, Path Graph A graph is a pair G = (V, E) of sets such that E V V. Path A path of length n in a graph G = (V, E) is a sequence v 0, v 1,... v n V such that (v 0, v 1 ),(v 1, v 2 ),...,(v n 1, v n ) E. Cycle A path v 0, v 1,... v n, v 0, where n 2, is called a cycle
17 Graph theory Rooted tree, subtree, tree height, node height Tree A rooted tree is an acyclic graph T = (V, E) with a special vertex v V called the root. Nodes with degree 1 are called leaves.
18 Alphabet and strings Alphabet and strings Definition (Alphabet) An alphabet Σ is a finite non-empty set whose elements are called letters.
19 Alphabet and strings Alphabet and strings Definition (Alphabet) An alphabet Σ is a finite non-empty set whose elements are called letters. Definition (String) A string on an alphabet Σ is a finite, possibly empty, sequence of elements of Σ.
20 Alphabet and strings Alphabet and strings Definition (Alphabet) An alphabet Σ is a finite non-empty set whose elements are called letters. Definition (String) A string on an alphabet Σ is a finite, possibly empty, sequence of elements of Σ. The zero-letter sequence is called the empty string, and is denoted by ε.
21 Alphabet and strings Alphabet and strings Definition (Alphabet) An alphabet Σ is a finite non-empty set whose elements are called letters. Definition (String) A string on an alphabet Σ is a finite, possibly empty, sequence of elements of Σ. The zero-letter sequence is called the empty string, and is denoted by ε. The set of all possible strings on the alphabet Σ is denoted by Σ.
22 Alphabet and strings Alphabet and strings Definition (Alphabet) An alphabet Σ is a finite non-empty set whose elements are called letters. Definition (String) A string on an alphabet Σ is a finite, possibly empty, sequence of elements of Σ. The zero-letter sequence is called the empty string, and is denoted by ε. The set of all possible strings on the alphabet Σ is denoted by Σ. Definition (Length of string) The length of a string x is defined as the length of the sequence associated with the string x, and is denoted by x.
23 Alphabet and strings Alphabet and strings We denote by x[i], for all 1 i x, the letter at index i of x. We also call index i, for all 1 i x, a position in x when x ε. It follows that the ith letter of x is the letter at position i in x, and that x = x[1.. x ]
24 Alphabet and strings Alphabet and strings We denote by x[i], for all 1 i x, the letter at index i of x. We also call index i, for all 1 i x, a position in x when x ε. It follows that the ith letter of x is the letter at position i in x, and that x = x[1.. x ] Definition (Factor of string) A string x is a factor (substring) of a string y if there exist two strings u and v, such that y = uxv. We denote the factor (substring) of x starting at position i and ending at position j as x[i.. j].
25 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
26 Trie Trie Retrieval Construct a dictionary for the set of words {amy, andy, ann, rob, roger, ben, betty} m a A r b B C D n e o E F y d n n J t b M g G H I y K L t N O e P Q y S r R T
27 Trie Trie Retrieval Construct a dictionary for the set of words {amy, andy, ann, rob, roger, ben, betty} m a A r b B C D n e o E F y d n n J t b M g G H I K $ y $ $ L t $ N O e P $ Q y S r R T $ $
28 Patricia tree Patricia tree 1 Construct a trie 2 Remove nodes with out-degree 1 and concatenate the labels of the corresponding edges to one edge m a A r b B C D n e o E F y d n n J t b M g G H I y K L t N O e P Q y S r R T
29 Patricia tree Patricia tree 1 Construct a trie 2 Remove nodes with out-degree 1 and concatenate the labels of the corresponding edges to one edge A B F G I J K M N P R T C D E H L Q O S a n n n b r o b e m y d y t t y g e r
30 Patricia tree Patricia tree 1 Construct a trie 2 Remove nodes with out-degree 1 and concatenate the labels of the corresponding edges to one edge a A ro my G B dy n F be J M n n b I K tty N ger P R T
31 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
32 Suffix trie Suffix trie Given some text, i.e. t = banana, construct the suffix trie. 1 Generate the set Suff(t) 2 Construct a trie from Suff(t) The resulting data structure is called a suffix trie. Example Given the t = banana$, the set Suff(t) is Suff(t) = {banana$, anana$, nana$, ana$, na$, a$}
33 Suffix trie Suffix trie - Example Given the text t = banana$, construct the suffix trie. a b n $ n a a 6 a n $ n $ n a 5 a 4 a n $ 3 $ a 2 $ 1
34 Suffix tree Suffix tree Definition A suffix tree is a patricia tree of the suffix trie. Construction 1 Construct a suffix trie of text x 2 Eliminate all nodes with out-degree 1 and concatenate the labels in the corresponding edges to one edge.
35 Suffix tree Suffix tree - Example a b n $ n a a 6 a n $ n $ n a 5 a 4 a n $ 3 $ a 2 $ 1
36 Suffix tree Suffix tree - Example a b n $ n a a 6 a n $ n $ n a 5 a 4 a n $ 3 $ a 2 $ 1
37 Suffix tree Suffix tree - Example a na 6 $ na $ $ banana$ 5 na$ 4 na$ 3 2 1
38 Suffix tree Size of suffix tree Theorem A suffix tree consists of at most 2n 1 nodes (or 2n if empty suffix $ is taken into account). Proof (by induction) Base case For 2 leaves we have 1 internal node. Inductive step Assume that any binary tree with m < N leaves consists of at exactly m 1 internal nodes. We must prove that a binary tree with N leaves has exactly N 1 internal nodes. A binary tree with N leaves is made up of: A root node. A left binary tree with k leaves. A right binary tree with N k leaves.
39 Suffix tree Size of suffix tree Proof (by induction) According to the induction assumption The left binary tree with k leaves consists of k 1 internal nodes. The right binary tree with N k leaves consists of N k 1 internal nodes. Therefore, the total number of internal nodes in a binary tree with N leaves is (k 1)+(N k 1)+1 = N 1 and thus, the total number of nodes is 2N 1.
40 Suffix tree Suffix tree construction algorithms Weiner s algorithm (1973) Introduced as position tree Construction in linear time (for constant size alphabets) Characterized as algorithm of the year McCreight s algorithm (1976) Improved space requirements over Weiner s method Construction in linear time (for constant size alphabets) Ukkonen s algorithm (1995) Same time and space requirements as McCreight s Easier to understand On-line Farach s algorithm (1997) Linear time construction algorithm for any type of alphabet Hard to implement The basis for new algorithms i.e. position heaps and suffix arrays in linear time
41 Ukkonen s algorithm Implicit suffix tree Definition An implicit suffix tree for string x is a tree obtained from the suffix tree of x by 1 Removing $ from all edge labels 2 Removing any edge that has no label 3 Removing any node with only one child a na a na $ na banana$ $ na$ na banana na 6 $ na$ na
42 Ukkonen s algorithm Implicit suffix tree Definition An implicit suffix tree for string x is a tree obtained from the suffix tree of x by 1 Removing $ from all edge labels 2 Removing any edge that has no label 3 Removing any node with only one child a na a na $ na banana$ $ na$ na banana na 6 $ na$ 5 3 na
43 Ukkonen s algorithm Implicit suffix tree Definition An implicit suffix tree for string x is a tree obtained from the suffix tree of x by 1 Removing $ from all edge labels 2 Removing any edge that has no label 3 Removing any node with only one child a na nana $ na banana$ $ na$ anana banana 6 $ na$
44 Ukkonen s algorithm Implicit suffix tree The implicit suffix tree of a string is what results by applying Ukkonen s algorithm to the string without an added end marker $. All suffixes are included, but not necessarily as labels of complete paths leading to leaves. By appending a unique character at the end of the string (in our case the $), the implicit suffix tree is essentially the same as the (true) suffix tree (only without $).
45 Ukkonen s algorithm String paths of implicit suffix trees Given a string y[1.. n], an implicit suffix tree I i contains each suffix y[1.. i], y[2.. i],..., y[i] of y as a label of some path (possibly ending at the middle of an edge) That is, a string path is a string that can be matched along the edges, starting from the root, or equivalently a prefix of any node label
46 Ukkonen s algorithm Ukkonen s algorithm 1 Start with T = I 1. 2 Consecutively update T to I 2, I 3,..., I n+1 in n phases, where I i represents the implicit suffix tree of prefix y[1.. i]. Phase i + 1 updates T from I i (with all suffixes of y[1.. i]) to I i+1 (with all suffixes of y[1.. i + 1]). Each phase i + 1 consists of extensions j = 1, 2,..., i + 1 (one for each suffix of y[1.. i + 1]). Extension j ensures that suffix y[j.. i + 1] is in I i+1.
47 Ukkonen s algorithm Suffix extension rules Rule 1 y[j.. i] ends at a leaf Insert y[i + 1] at the end of the edge label Rule 2 y[j.. i] doesn t end at a leaf, and the following character is not y[i + 1] Connect the end of the path to a new leaf j by an edge labeled y[i + 1]. If the path ended at the middle of an edge, split that edge and insert a new node as the parent of leaf j. Rule 3 If the path y[j.. i] is already in the tree. No update.
48 Which is even worse than the naive algorithm which runs in O(n 2 ). We will see how this approach, with the use of some simple tricks, can achieve linear run-time. Ukkonen s algorithm Complexity Complexity The so-far presented algorithmic approach runs in O(n 3 ). Proof Consider a single phase i + 1. Each extension rule can be applied in O(1) Applying all i + 1 extensions takes time Θ(i). Locating the ends of string paths y[1.. i],..., y[i] by traversing the edge labels takes time Σ i k=1 = Θ(i2 ). Therefore, the total time for all phases i = 1, 2,..., n is Σ n i=1i 2 = Θ(n 3 )
49 Ukkonen s algorithm Suffix links The extensions of phase i + 1 need to locate the ends of all i + 1 suffixes of y[1.. i], and apply Rules 1-3. How to do this efficiently? For each internal node v of I i labeled xα, where x Σ and α Σ, define s(v) to be the node labeled by α. (Do these nodes actually exist?) Then a pointer from v to s(v) is called the suffix link of v. Note: If node v is labeled by a single character then α = ε and s(v) is the root node.
50 Ukkonen s algorithm Example of suffix links Suffix tree for x = xabxac bxac c a xa 3 6 c bxac c bxac
51 Ukkonen s algorithm Why do we need suffix links? Extension j (of phase i + 1) finds the end of the path y[j.. i] in the tree (and extends it with character y[i + 1]) Extension j + 1 similarly finds the end of the path y[j i] Assume that v is an internal node whose string path y[j]α is (essentialy) a prefix of y[j.. i]. Then we can avoid traversing path α when locating the end of path y[j i], by starting from node s(v). Do suffix links always exist?
52 Ukkonen s algorithm Suffix links existence Observation If an internal node v is created during extension j (of phase i + 1), then extension j + 1 will find out the node s(v). Let v be labeled xα Node v can only be created by extension Rule 2. That is, v is inserted at the end of path y[j.. i], which continued by some character c y[i + 1]. Therefore, paths xαc and αc have been entered before phase i + 1. in extension j + 1, node s(v) is either found or created at the end of path α = y[j i].
53 Ukkonen s algorithm Speeding up path traversals Consider extensions of phase i + 1 Extension 1 extends path y[1.. i] with character y[i + 1]. Extension 1 is easy as path y[1.. i] always ends at leaf 1, and is thus extended by Rule 1. We can perform extension 1 in constant time, if we maintain a pointer to the edge at the end of y[1.. i]. What about subsequent extensions j + 1 (for j = 1, 2,..., i)?
54 Ukkonen s algorithm Locating subsequent paths Extension j has located the end of path y[j.. i] and v is the node last visited. Starting from there, walk up at most one node either 1 to the root, or 2 to a node s(v) with a suffix link from v In case of (1), traverse path y[j i] explicitly down-wards from the root.
55 Ukkonen s algorithm Locating subsequent paths In case of (2), let xα be the label of v y[j.. i] = xαβ for some β Σ Then follow the suffix link of v, and continue by matching β down-wards from node s(v) (whose string-path is α). Having found the end of path αβ = y[j i], apply extension rules to ensure that it extends with y[i + 1]. Finally, if a new internal node w was created in extension j, set its suffix link to point to the end node of path y[j i]
56 Ukkonen s algorithm Locating subsequent paths - Illustration In case of (2), let xα be the label of v y[j.. i] = xαβ for some β Σ (in this case β = abcd) xα α s(v) a v abcd bc d
57 Ukkonen s algorithm Speeding up explicit traversals Skip/Count trick In phase i + 1, each path y[j.. i], which is followed in extension j, is known to exist in the tree The path can be followed by choosing the correct edges, instead of examining every character Let y[k] be the next character to be matched on path y[j.. i] Now an edge labeled by y[p.. q] can be traversed simply by checking that y[p] = y[k], and skipping the next q p characters of y[j.. i] The time to traverse a path is proportional to the number of nodes on the path (instead of its string length)
58 Ukkonen s algorithm Speeding up explicit traversals Lemma For any node v with a suffix link to s(v), it holds that depth(v) 1 depth(s(v)) depth(v) Sketch of proof The suffix links for any ancestor of v lead to distinct ancestors of s(v).
59 Ukkonen s algorithm Linear bound for any single phase Theorem Using suffix links and the skip/count trick, a single phase i takes time O(n) Proof There are i + 1 n+1 extensions in phase i + 1 In any extension, other work except tree-traversal (that is, extension rules) takes O(1) time only How to bound the work for traversing the tree? To find the end of the next path, an extension first moves at most one level up. Then a suffix link may be followed, which is followed by a down-traversal to match the rest of the path
60 Ukkonen s algorithm Linear bound for any single phase The up-walk in any extension decreases the current node depth by at most one (since it moves up at most one node) and each suffix link traversal decreases the node-depth by at most another one (previous Lemma). Thus the current node depth is decremented at most 2n times during the entire phase. On the other hand, the current node depth cannot exceed n it is incremented (by following downward edges) at most 3n times total run-time of a phase is thus O(n) Improvement Since there are n phases, the total run-time is O(n 2 )
61 Ukkonen s algorithm Final improvements (1) Some extensions can be found unnecessary to compute explicitly Observation 1 - Rule 3 terminates current phase If path y[j.. i + 1] is already in the tree, so are paths y[j i + 1]... y[i + 1] Phase i + 1 can be finished at the first extension j that applies Rule 3
62 Ukkonen s algorithm Final improvements (2) Observation 2 - Once a leaf, always a leaf A node created as a leaf remains a leaf thereafter because no extension rule adds children to a leaf. If extension j created a leaf (numbered j), extension j of any later phase i + 1 applies Rule 1 (appending the next character y[i + 1] to label of the edge ending at leaf j. Explicit applications of Rule 1 can be eliminated as follows: Use compressed edge representation (i.e. indices p and q instead of substring y[p.. q]), and represent the end position of each terminal edge by a global value e, for the current end position (phase).
63 Ukkonen s algorithm Eliminating extensions Denote by j i the last non-void extension of phase i (that is, application of Rule 1 or 2) Obs 1 extensions 1,..., j i of phase i are non-void leaves 1,..., j i have been created at the end of phase i Obs 2 extensions 1,..., j i of any subsequent phase all apply Rule 1 j i+1 j i Execute only extensions j i + 1, j i + 2,... explicitly in phase i + 1
64 Ukkonen s algorithm Single phase algorithm Algorithm for phase i + 1 with unnecessary extensions eliminated 1 Set e = i + 1 (implements extensions 1,..., j i implicitly 2 Compute extensions j i + 1,..., j until j > i + 1 or Rule 3 was applied in extension j 3 Set j i+1 = j 1 (for the next phase) All these tricks together can be shown to lead to linear run-time
65 Ukkonen s algorithm Complexity of the tuned implementation (1) Theorem Ukkonen s algorithm builds the suffix tree for y[1.. n] in time O(n), when implemented using the mentioned tricks. Proof The extensions computed explicitly in any two phases i and i + 1 are disjoint except for extension j, which may be computed anew in phase i + 1. The second computation of extension j can be done in O(1) by remembering the end of the path entered in the previous computation
66 Ukkonen s algorithm Complexity of the tuned implementation (2) Let j = 1,..., n+1 denote the index of the current extension Over all phases 2,..., n+1 index j never decreases, but it can remain the same at the start of phases 3,..., n+1 at most 2n extensions are computed explicitly. Similarly to the previous proof (skip/count), the current node depth can be decremented at most 4n times, and thus the total length of all downward traversals is bounded by 5n
67 Ukkonen s algorithm Obtaining the true suffix tree Finally, the implicit suffix tree I n+1 can be converted to the true suffix tree of y[1.. n]$ in the following way All occurrences of the current end position marker e on edge labels can be replaced by n+1 (with a simple tree traversal, in time O(n))
68 Ukkonen s algorithm Ukkonen s algorith Reads a string x of size n from left to right. The algorithm is on-line, i.e. at step 1 i n it constructs an implicit suffix tree of prefix y[1.. i] which can then be easily converted to the (true) suffix tree by appending a unique symbol $ that has not appeared before. Runs in O(n) time for constant-size alphabets or O(n log n) for general alphabets. Requires O(n) space.
69 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
70 Suffix tree - Example y = a b c a b x a b c $ Phase 1
71 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, e) 1
72 Suffix tree - Example y = a b c a b x a b c $ Phase 2 (1, e) 1
73 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) 1
74 Suffix tree - Example y = a b c a b x a b c $ Explicit (1, e) (2, e) 1 2
75 Suffix tree - Example y = a b c a b x a b c $ Phase 3 (1, e) (2, e) 1 2
76 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 1 2
77 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 1 2
78 Suffix tree - Example y = a b c a b x a b c $ Explicit (1, e) (2, e) 3 1 2
79 Suffix tree - Example y = a b c a b x a b c $ Phase 4 (1, e) (2, e) 3 1 2
80 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
81 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
82 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
83 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, e) (2, e) 3 1 2
84 Suffix tree - Example y = a b c a b x a b c $ Phase 5 (1, e) (2, e) 3 1 2
85 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
86 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
87 Suffix tree - Example y = a b c a b x a b c $ Implicit (1, e) (2, e) 3 1 2
88 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, e) (2, e) 3 1 2
89 Suffix tree - Example y = a b c a b x a b c $ Phase 6 (1, e) (2, e) 3 1 2
90 Suffix tree - Example y = a b c a b x a b c $ Skip all implicit (1, e) (2, e) 3 1 2
91 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, e) (2, e) 3 1 2
92 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, e) 3 1 2
93 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) 4 (2, e) 3 1 2
94 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, e)
95 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2)
96 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2)
97 Suffix tree - Example y = a b c a b x a b c $ Create suffix link (1, 2) (2, 2)
98 Suffix tree - Example y = a b c a b x a b c $ Create suffix link (1, 2) (2, 2)
99 Suffix tree - Example y = a b c a b x a b c $ (1, 2) (2, 2)
100 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2)
101 Suffix tree - Example y = a b c a b x a b c $ Phase 7 (1, 2) (2, 2)
102 Suffix tree - Example y = a b c a b x a b c $ Skip all implicit (1, 2) (2, 2)
103 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, 2) (2, 2)
104 Suffix tree - Example y = a b c a b x a b c $ Phase 8 (1, 2) (2, 2)
105 Suffix tree - Example y = a b c a b x a b c $ Skip all implicit (1, 2) (2, 2)
106 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, 2) (2, 2)
107 Suffix tree - Example y = a b c a b x a b c $ Phase 9 (1, 2) (2, 2)
108 Suffix tree - Example y = a b c a b x a b c $ Skip all implicit (1, 2) (2, 2)
109 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, 2) (2, 2)
110 Suffix tree - Example y = a b c a b x a b c $ Phase 10 (1, 2) (2, 2)
111 Suffix tree - Example y = a b c a b x a b c $ Skip all implicit (1, 2) (2, 2)
112 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 3 (1, 2) (2, 2)
113 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) 6 (4, e)
114 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) 6 4 (4, e) (10, e)
115 Suffix tree - Example y = a b c a b x a b c $ Follow suffix link (1, 2) (3, 3) (2, 2) 6 4 (4, e) (10, e)
116 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) 6 4 (4, e) (10, e)
117 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) (3, 3) 6 4 (4, e) (10, e)
118 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) (3, 3) 6 4 (4, e) (10, e) 5 (10, e)
119 Suffix tree - Example y = a b c a b x a b c $ Follow suffix link (1, 2) (3, 3) (2, 2) (3, 3) (4, e) (10, e) (10, e)
120 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (3, 3) (2, 2) (3, 3) (4, e) (10, e) (10, e)
121 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (4, e) (10, e) (10, e)
122 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (10, e) (4, e) (10, e) (10, e)
123 Suffix tree - Example y = a b c a b x a b c $ Create suffix link (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (10, e) (4, e) (10, e) (10, e)
124 Suffix tree - Example y = a b c a b x a b c $ Create suffix link (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (10, e) (4, e) (10, e) (10, e)
125 Suffix tree - Example y = a b c a b x a b c $ (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (10, e) (4, e) (10, e) (10, e)
126 Suffix tree - Example y = a b c a b x a b c $ Explicit - Rule 2 (1, 2) (2, 2) (3, 3) (3, 3) (3, 3) (4, e) (10, e) (4, e) (10, e) (10, e) (10, e) 10 6
127 Application - finding all occurrences of a query y = a b c a b x a b c $ ab b c $ xabc$ c xabc$ c xabc$ $ abxabc$ abxabc$ $ $ abxabc$ Query the string a
128 Application - finding all occurrences of a query y = a b c a b x a b c $ Find the node to which the string path a leads to ab b c $ xabc$ c xabc$ c xabc$ $ abxabc$ abxabc$ $ $ abxabc$ Query the string a
129 Application - finding all occurrences of a query y = a b c a b x a b c $ Get the leafs of that node ab b c $ xabc$ c xabc$ c xabc$ $ abxabc$ abxabc$ $ $ abxabc$ Query the string a
130 Application - finding all occurrences of a query y = a b c a b x a b c $ Leaves indicate the starting positions of a ab b c $ xabc$ c xabc$ c xabc$ $ abxabc$ abxabc$ $ $ abxabc$ Query the string a
131 Contents 1 Introduction 2 Basic Definitions 3 Dictionaries 4 Suffix tree 5 Example 6 Overview
132 Overview We had a quick look on indexing. Preprocessing a given text Efficient querying afterwards We ve seen what suffix trees are and some of their properties. Patricia suffix tries for a string x[1.. n] At most 2n 1 nodes Exactly n leaves We ve seen Ukkonen s algorithm. Fairly simple to understand Linear time construction for constant-size alphabets
133 Reminder - Next week Next week s lecture will take place at SR 148, Building 50.34
Lecture 6: Suffix Trees and Their Construction
Biosequence Algorithms, Spring 2007 Lecture 6: Suffix Trees and Their Construction Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 6: Intro to suffix trees p.1/46 II:
More informationLecture 5: Suffix Trees
Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 4: Suffix trees Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationLecture 7 February 26, 2010
6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some
More informationSuffix links are stored for compact trie nodes only, but we can define and compute them for any position represented by a pair (u, d):
Suffix links are the same as Aho Corasick failure links but Lemma 4.4 ensures that depth(slink(u)) = depth(u) 1. This is not the case for an arbitrary trie or a compact trie. Suffix links are stored for
More informationExact String Matching Part II. Suffix Trees See Gusfield, Chapter 5
Exact String Matching Part II Suffix Trees See Gusfield, Chapter 5 Outline for Today What are suffix trees Application to exact matching Building a suffix tree in linear time, part I: Ukkonen s algorithm
More informationLecture L16 April 19, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture L16 April 19, 2012 1 Overview In this lecture, we consider the string matching problem - finding some or all places in a text where
More informationApplications of Suffix Tree
Applications of Suffix Tree Let us have a glimpse of the numerous applications of suffix trees. Exact String Matching As already mentioned earlier, given the suffix tree of the text, all occ occurrences
More informationLecture 18 April 12, 2005
6.897: Advanced Data Structures Spring 5 Prof. Erik Demaine Lecture 8 April, 5 Scribe: Igor Ganichev Overview In this lecture we are starting a sequence of lectures about string data structures. Today
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 5: Suffix trees and their applications Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory
More informationString Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42
String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt
More informationData structures for string pattern matching: Suffix trees
Suffix trees Data structures for string pattern matching: Suffix trees Linear algorithms for exact string matching KMP Z-value algorithm What is suffix tree? A tree-like data structure for solving problems
More informationSuffix trees and applications. String Algorithms
Suffix trees and applications String Algorithms Tries a trie is a data structure for storing and retrieval of strings. Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x
More informationSuffix Trees and its Construction
Chapter 5 Suffix Trees and its Construction 5.1 Introduction to Suffix Trees Sometimes fundamental techniques do not make it into the mainstream of computer science education in spite of its importance,
More information1 Introduciton. 2 Tries /651: Algorithms CMU, Spring Lecturer: Danny Sleator
15-451/651: Algorithms CMU, Spring 2015 Lecture #25: Suffix Trees April 22, 2015 (Earth Day) Lecturer: Danny Sleator Outline: Suffix Trees definition properties (i.e. O(n) space) applications Suffix Arrays
More informationComputing the Longest Common Substring with One Mismatch 1
ISSN 0032-9460, Problems of Information Transmission, 2011, Vol. 47, No. 1, pp. 1??. c Pleiades Publishing, Inc., 2011. Original Russian Text c M.A. Babenko, T.A. Starikovskaya, 2011, published in Problemy
More information58093 String Processing Algorithms. Lectures, Autumn 2013, period II
58093 String Processing Algorithms Lectures, Autumn 2013, period II Juha Kärkkäinen 1 Contents 0. Introduction 1. Sets of strings Search trees, string sorting, binary search 2. Exact string matching Finding
More informationVerifying a Border Array in Linear Time
Verifying a Border Array in Linear Time František Franěk Weilin Lu P. J. Ryan W. F. Smyth Yu Sun Lu Yang Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario
More informationExact Matching Part III: Ukkonen s Algorithm. See Gusfield, Chapter 5 Visualizations from
Exact Matching Part III: Ukkonen s Algorithm See Gusfield, Chapter 5 Visualizations from http://brenden.github.io/ukkonen-animation/ Goals for Today Understand how suffix links are used in Ukkonen's algorithm
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationUkkonen s suffix tree algorithm
Ukkonen s suffix tree algorithm Recall McCreight s approach: For i = 1.. n+1, build compressed trie of {x[..n]$ i} Ukkonen s approach: For i = 1.. n+1, build compressed trie of {$ i} Compressed trie of
More informationGiven a text file, or several text files, how do we search for a query string?
CS 840 Fall 2016 Text Search and Succinct Data Structures: Unit 4 Given a text file, or several text files, how do we search for a query string? Note the query/pattern is not of fixed length, unlike key
More informationApplied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017
Applied Databases Lecture 14 Indexed String Search, Suffix Trees Sebastian Maneth University of Edinburgh - March 9th, 2017 2 Recap: Morris-Pratt (1970) Given Pattern P, Text T, find all occurrences of
More informationEE 368. Weeks 5 (Notes)
EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A
More information17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.
17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications
More informationFigure 1. The Suffix Trie Representing "BANANAS".
The problem Fast String Searching With Suffix Trees: Tutorial by Mark Nelson http://marknelson.us/1996/08/01/suffix-trees/ Matching string sequences is a problem that computer programmers face on a regular
More informationmarc skodborg, simon fischer,
E F F I C I E N T I M P L E M E N TAT I O N S O F S U F - F I X T R E E S marc skodborg, 201206073 simon fischer, 201206049 master s thesis June 2017 Advisor: Christian Nørgaard Storm Pedersen AARHUS AU
More information11/5/13 Comp 555 Fall
11/5/13 Comp 555 Fall 2013 1 Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Phenotypes arise from copy-number variations Genomic rearrangements are often associated with repeats Trace
More informationRange Minimum Queries Part Two
Range Minimum Queries Part Two Recap from Last Time The RMQ Problem The Range Minimum Query (RMQ) problem is the following: Given a fixed array A and two indices i j, what is the smallest element out of
More informationNon-context-Free Languages. CS215, Lecture 5 c
Non-context-Free Languages CS215 Lecture 5 c 2007 1 The Pumping Lemma Theorem (Pumping Lemma) Let be context-free There exists a positive integer divided into five pieces Proof for for each and Let and
More informationChapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 7 Space and Time Tradeoffs Copyright 2007 Pearson Addison-Wesley. All rights reserved. Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement preprocess the input
More informationCompressed Indexes for Dynamic Text Collections
Compressed Indexes for Dynamic Text Collections HO-LEUNG CHAN University of Hong Kong and WING-KAI HON National Tsing Hua University and TAK-WAH LAM University of Hong Kong and KUNIHIKO SADAKANE Kyushu
More information11/5/09 Comp 590/Comp Fall
11/5/09 Comp 590/Comp 790-90 Fall 2009 1 Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary secrets Many tumors
More informationData Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University
Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University U Kang (2016) 1 In This Lecture The concept of binary tree, its terms, and its operations Full binary tree theorem Idea
More informationPAPER Constructing the Suffix Tree of a Tree with a Large Alphabet
IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is
More informationSolution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.
Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree. The basic approach is the same as when using a suffix tree,
More informationString Matching Algorithms
String Matching Algorithms Georgy Gimel farb (with basic contributions from M. J. Dinneen, Wikipedia, and web materials by Ch. Charras and Thierry Lecroq, Russ Cox, David Eppstein, etc.) COMPSCI 369 Computational
More informationAnalysis of Algorithms
Analysis of Algorithms Concept Exam Code: 16 All questions are weighted equally. Assume worst case behavior and sufficiently large input sizes unless otherwise specified. Strong induction Consider this
More informationCMSC th Lecture: Graph Theory: Trees.
CMSC 27100 26th Lecture: Graph Theory: Trees. Lecturer: Janos Simon December 2, 2018 1 Trees Definition 1. A tree is an acyclic connected graph. Trees have many nice properties. Theorem 2. The following
More informationMarch 20/2003 Jayakanth Srinivasan,
Definition : A simple graph G = (V, E) consists of V, a nonempty set of vertices, and E, a set of unordered pairs of distinct elements of V called edges. Definition : In a multigraph G = (V, E) two or
More informationAdvanced Algorithms: Project
Advanced Algorithms: Project (deadline: May 13th, 2016, 17:00) Alexandre Francisco and Luís Russo Last modified: February 26, 2016 This project considers two different problems described in part I and
More informationRange Minimum Queries Part Two
Range Minimum Queries Part Two Recap from Last Time The RMQ Problem The Range Minimum Query (RMQ) problem is the following: Given a fied array A and two indices i j, what is the smallest element out of
More informationFoundations of Computer Science Spring Mathematical Preliminaries
Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College
More informationSuffix Vector: A Space-Efficient Suffix Tree Representation
Lecture Notes in Computer Science 1 Suffix Vector: A Space-Efficient Suffix Tree Representation Krisztián Monostori 1, Arkady Zaslavsky 1, and István Vajk 2 1 School of Computer Science and Software Engineering,
More informationAlgorithms and Theory of Computation. Lecture 7: Priority Queue
Algorithms and Theory of Computation Lecture 7: Priority Queue Xiaohui Bei MAS 714 September 5, 2017 Nanyang Technological University MAS 714 September 5, 2017 1 / 15 Priority Queues Priority Queues Store
More informationIntroduction to Suffix Trees
Algorithms on Strings, Trees, and Sequences Dan Gusfield University of California, Davis Cambridge University Press 1997 Introduction to Suffix Trees A suffix tree is a data structure that exposes the
More information(2,4) Trees. 2/22/2006 (2,4) Trees 1
(2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary
More informationSuffix Trees and Arrays
Suffix Trees and Arrays Yufei Tao KAIST May 1, 2013 We will discuss the following substring matching problem: Problem (Substring Matching) Let σ be a single string of n characters. Given a query string
More informationCOMP4128 Programming Challenges
Multi- COMP4128 Programming Challenges School of Computer Science and Engineering UNSW Australia Table of Contents 2 Multi- 1 2 Multi- 3 3 Multi- Given two strings, a text T and a pattern P, find the first
More informationBinary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.
COMP3600/6466 Algorithms 2018 Lecture 12 1 Binary search trees Reading: Cormen et al, Sections 12.1 to 12.3 Binary search trees are data structures based on binary trees that support operations on dynamic
More informationFinal Examination CSE 100 UCSD (Practice)
Final Examination UCSD (Practice) RULES: 1. Don t start the exam until the instructor says to. 2. This is a closed-book, closed-notes, no-calculator exam. Don t refer to any materials other than the exam
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains
More informationFast Substring Matching
Fast Substring Matching Andreas Klein 1 2 3 4 5 6 7 8 9 10 Abstract The substring matching problem occurs in several applications. Two of the well-known solutions are the Knuth-Morris-Pratt algorithm (which
More informationRandomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item
Randomized incremental construction Special sampling idea: Sample all except one item hope final addition makes small or no change Method: process items in order average case analysis randomize order to
More informationIntroduction to Computers and Programming. Concept Question
Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 7 April 2 2004 Concept Question G1(V1,E1) A graph G(V, where E) is V1 a finite = {}, nonempty E1 = {} set of G2(V2,E2) vertices and
More informationPriority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.
Priority Queues 1 Introduction Many applications require a special type of queuing in which items are pushed onto the queue by order of arrival, but removed from the queue based on some other priority
More informationAlphabet-Dependent String Searching with Wexponential Search Trees
Alphabet-Dependent String Searching with Wexponential Search Trees Johannes Fischer and Pawe l Gawrychowski February 15, 2013 arxiv:1302.3347v1 [cs.ds] 14 Feb 2013 Abstract It is widely assumed that O(m
More informationSFU CMPT Lecture: Week 9
SFU CMPT-307 2008-2 1 Lecture: Week 9 SFU CMPT-307 2008-2 Lecture: Week 9 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on July 8, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 9 Binary search trees
More informationDO NOT. In the following, long chains of states with a single child are condensed into an edge showing all the letters along the way.
CS61B, Fall 2009 Test #3 Solutions P. N. Hilfinger Unless a question says otherwise, time estimates refer to asymptotic bounds (O( ), Ω( ), Θ( )). Always give the simplest bounds possible (O(f(x)) is simpler
More informationCache-Oblivious String Dictionaries
Cache-Oblivious String Dictionaries Gerth Stølting Brodal Rolf Fagerberg Abstract We present static cache-oblivious dictionary structures for strings which provide analogues of tries and suffix trees in
More informationUniversity of Waterloo CS240R Fall 2017 Solutions to Review Problems
University of Waterloo CS240R Fall 2017 Solutions to Review Problems Reminder: Final on Tuesday, December 12 2017 Note: This is a sample of problems designed to help prepare for the final exam. These problems
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationModule 2: Classical Algorithm Design Techniques
Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module
More informationAn Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario
An Efficient Algorithm for Identifying the Most Contributory Substring Ben Stephenson Department of Computer Science University of Western Ontario Problem Definition Related Problems Applications Algorithm
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationNotes on Binary Dumbbell Trees
Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes
More informationTwo Dimensional Dictionary Matching
Two Dimensional Dictionary Matching Amihood Amir Martin Farach Georgia Tech DIMACS September 10, 1992 Abstract Most traditional pattern matching algorithms solve the problem of finding all occurrences
More informationDisjoint-set data structure: Union-Find. Lecture 20
Disjoint-set data structure: Union-Find Lecture 20 Disjoint-set data structure (Union-Find) Problem: Maintain a dynamic collection of pairwise-disjoint sets S = {S 1, S 2,, S r }. Each set S i has one
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationGraph Algorithms Using Depth First Search
Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth
More informationProblem Set 5 Solutions
Introduction to Algorithms November 4, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 21 Problem Set 5 Solutions Problem 5-1. Skip
More informationCS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics
CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to
More informationMODELING DELTA ENCODING OF COMPRESSED FILES. and. and
International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University
More informationBinary search trees 3. Binary search trees. Binary search trees 2. Reading: Cormen et al, Sections 12.1 to 12.3
Binary search trees Reading: Cormen et al, Sections 12.1 to 12.3 Binary search trees 3 Binary search trees are data structures based on binary trees that support operations on dynamic sets. Each element
More informationLower Bound on Comparison-based Sorting
Lower Bound on Comparison-based Sorting Different sorting algorithms may have different time complexity, how to know whether the running time of an algorithm is best possible? We know of several sorting
More informationAlgorithms Dr. Haim Levkowitz
91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic
More informationAlgorithms Theory. 15 Text Search (2)
Algorithms Theory 15 Text Search (2) Construction of suffix trees Prof. Dr. S. Albers Suffix tree t = x a b x a $ 1 2 3 4 5 6 a x a b x a $ 1 $ a x b $ $ 4 3 $ b x a $ 6 5 2 2 : implicit suffix trees Definition:
More informationSuffix-based text indices, construction algorithms, and applications.
Suffix-based text indices, construction algorithms, and applications. F. Franek Computing and Software McMaster University Hamilton, Ontario 2nd CanaDAM Conference Centre de recherches mathématiques in
More informationAn undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.
Trees Trees form the most widely used subclasses of graphs. In CS, we make extensive use of trees. Trees are useful in organizing and relating data in databases, file systems and other applications. Formal
More informationSuffix Tree and Array
Suffix Tree and rray 1 Things To Study So far we learned how to find approximate matches the alignments. nd they are difficult. Finding exact matches are much easier. Suffix tree and array are two data
More information1 The range query problem
CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition
More informationQuiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)
Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders
More informationCache-Oblivious String Dictionaries
Cache-Oblivious String Dictionaries Gerth Stølting Brodal University of Aarhus Joint work with Rolf Fagerberg #"! Outline of Talk Cache-oblivious model Basic cache-oblivious techniques Cache-oblivious
More informationimplementing the breadth-first search algorithm implementing the depth-first search algorithm
Graph Traversals 1 Graph Traversals representing graphs adjacency matrices and adjacency lists 2 Implementing the Breadth-First and Depth-First Search Algorithms implementing the breadth-first search algorithm
More informationSearch Trees. Undirected graph Directed graph Tree Binary search tree
Search Trees Undirected graph Directed graph Tree Binary search tree 1 Binary Search Tree Binary search key property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then
More informationHashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong
Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or
More informationMITOCW watch?v=ninwepprkdq
MITOCW watch?v=ninwepprkdq The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To
More informationUniversity of Waterloo CS240R Fall 2017 Review Problems
University of Waterloo CS240R Fall 2017 Review Problems Reminder: Final on Tuesday, December 12 2017 Note: This is a sample of problems designed to help prepare for the final exam. These problems do not
More informationRecursively Defined Functions
Section 5.3 Recursively Defined Functions Definition: A recursive or inductive definition of a function consists of two steps. BASIS STEP: Specify the value of the function at zero. RECURSIVE STEP: Give
More informationUniversity of Waterloo CS240R Winter 2018 Help Session Problems
University of Waterloo CS240R Winter 2018 Help Session Problems Reminder: Final on Monday, April 23 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems do
More informationDynamic indexes vs. static hierarchies for substring search
Dynamic indexes vs. static hierarchies for substring search Nils Grimsmo 25-6-15 2 Preface This is a master thesis for the Master of Technology program at the Department of Computer and Information Science
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationBUNDLED SUFFIX TREES
Motivation BUNDLED SUFFIX TREES Luca Bortolussi 1 Francesco Fabris 2 Alberto Policriti 1 1 Department of Mathematics and Computer Science University of Udine 2 Department of Mathematics and Computer Science
More informationDistributed and Paged Suffix Trees for Large Genetic Databases p.1/18
istributed and Paged Suffix Trees for Large Genetic atabases Raphaël Clifford and Marek Sergot raphael@clifford.net, m.sergot@ic.ac.uk Imperial College London, UK istributed and Paged Suffix Trees for
More informationIntroduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1
Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental
More informationSpace Efficient Linear Time Construction of
Space Efficient Linear Time Construction of Suffix Arrays Pang Ko and Srinivas Aluru Dept. of Electrical and Computer Engineering 1 Laurence H. Baker Center for Bioinformatics and Biological Statistics
More informationSmall-Space 2D Compressed Dictionary Matching
Small-Space 2D Compressed Dictionary Matching Shoshana Neuburger 1 and Dina Sokol 2 1 Department of Computer Science, The Graduate Center of the City University of New York, New York, NY, 10016 shoshana@sci.brooklyn.cuny.edu
More informationBinary Heaps in Dynamic Arrays
Yufei Tao ITEE University of Queensland We have already learned that the binary heap serves as an efficient implementation of a priority queue. Our previous discussion was based on pointers (for getting
More information