Lecture 6: Suffix Trees and Their Construction

Similar documents
Special course in Computer Science: Advanced Text Algorithms

An introduction to suffix trees and indexing

Exact String Matching Part II. Suffix Trees See Gusfield, Chapter 5

Lecture 5: Suffix Trees

Special course in Computer Science: Advanced Text Algorithms

Exact Matching Part III: Ukkonen s Algorithm. See Gusfield, Chapter 5 Visualizations from

Data structures for string pattern matching: Suffix trees

4. Suffix Trees and Arrays

4. Suffix Trees and Arrays

Suffix links are stored for compact trie nodes only, but we can define and compute them for any position represented by a pair (u, d):

Suffix Trees and its Construction

Computing the Longest Common Substring with One Mismatch 1

Ukkonen s suffix tree algorithm

Applications of Suffix Tree

Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.

Given a text file, or several text files, how do we search for a query string?

Lecture 7 February 26, 2010

Suffix trees and applications. String Algorithms

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

Suffix Tree and Array

Non-context-Free Languages. CS215, Lecture 5 c

1. Gusfield text for chapter 5&6 about suffix trees are scanned and uploaded on the web 2. List of Project ideas is uploaded

Lecture L16 April 19, 2012

Introduction to Suffix Trees

58093 String Processing Algorithms. Lectures, Autumn 2013, period II

17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.

1 Computing alignments in only linear space

Lecture 9: Core String Edits and Alignments

Figure 1. The Suffix Trie Representing "BANANAS".

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

1 Introduciton. 2 Tries /651: Algorithms CMU, Spring Lecturer: Danny Sleator

Algorithms Theory. 15 Text Search (2)

Lecture 18 April 12, 2005

Disjoint-set data structure: Union-Find. Lecture 20

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

BUNDLED SUFFIX TREES

Lecture 5: Duality Theory

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Distributed and Paged Suffix Trees for Large Genetic Databases p.1/18

Trapezoidal Maps. Notes taken from CG lecture notes of Mount (pages 60 69) Course page has a copy

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

implementing the breadth-first search algorithm implementing the depth-first search algorithm

Lecture 3 February 9, 2010

Bálint Márk Vásárhelyi Suffix Trees and Their Applications

Union-Find and Amortization

1 The range query problem

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

An Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario

Suffix Trees and Arrays

Copyright 2000, Kevin Wayne 1

11/5/09 Comp 590/Comp Fall

Range Minimum Queries Part Two

PRAM ALGORITHMS: BRENT S LAW

Figure 4.1: The evolution of a rooted tree.

PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet

Optimization I : Brute force and Greedy strategy

Notes for Lecture 21. From One-Time Signatures to Fully Secure Signatures

Lower Bound on Comparison-based Sorting

We have used both of the last two claims in previous algorithms and therefore their proof is omitted.

Chapter 3. Graphs. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Data Compression Techniques

Notes on Binary Dumbbell Trees

Algorithms Dr. Haim Levkowitz

11/5/13 Comp 555 Fall

marc skodborg, simon fischer,

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Advanced Algorithms: Project

An Algorithm for Enumerating All Spanning Trees of a Directed Graph 1. S. Kapoor 2 and H. Ramesh 3

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

What are suffix trees?

Inexact Matching, Alignment. See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming)

COMP Online Algorithms. Online Bin Packing. Shahin Kamali. Lecture 20 - Nov. 16th, 2017 University of Manitoba

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Verifying a Border Array in Linear Time

16 Greedy Algorithms

CMSC 754 Computational Geometry 1

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Sequence Mining 10.1 FREQUENT SEQUENCES CHAPTER 10

3.1 Basic Definitions and Applications

1. Meshes. D7013E Lecture 14

Disjoint set (Union-Find)

Problem Set 5 Solutions

Lecture 1. 1 Notation

Dynamic indexes vs. static hierarchies for substring search

CMSC th Lecture: Graph Theory: Trees.

1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue

Sublinear Time and Space Algorithms 2016B Lecture 7 Sublinear-Time Algorithms for Sparse Graphs

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Alphabet-Dependent String Searching with Wexponential Search Trees

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University

Compressed Indexes for Dynamic Text Collections

An Algorithm for Enumerating all Directed Spanning Trees in a Directed Graph

Trapezoidal decomposition:

Cuckoo Hashing for Undergraduates

Lecture 9 March 4, 2010

SAMPLING AND THE MOMENT TECHNIQUE. By Sveta Oksen

Transcription:

Biosequence Algorithms, Spring 2007 Lecture 6: Suffix Trees and Their Construction Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 6: Intro to suffix trees p.1/46

II: Suffix Trees and Their Applications ( loppuosapuut sekä niiden sovelluksia) BSA Lecture 6: Intro to suffix trees p.2/46

Introduction to Suffix Trees Suffix tree is an index structure that gives efficient solutions to a wide range of complex string problems For example, the substring problem: For a text S of length m, after O(m) time preprocessing, given any string P either find an occurrence of P in S, or determine that one does not exist, in time O( P ) How to solve it? BSA Lecture 6: Intro to suffix trees p.3/46

Solving the Substring Problem First idea: Build a keyword tree of all substrings of S No: takes too much time (Ω(m 2 )) Second idea: It is easy to find prefixes of strings in a keyword tree. Each substring S[i... j] is a prefix of the suffix S[i... m] of S create a keyword tree of the m non-empty suffixes of S The rest is just refinement... (yet quite challenging to get the construction time to linear!) BSA Lecture 6: Intro to suffix trees p.4/46

Definition of a Suffix Tree A suffix tree T for S[1... m] is a rooted tree with... m leaves numbered 1,..., m at least two children for each internal node (with the root as a possible exception) each edge labeled by a nonempty substring of S no two edges out of a node beginning with the same character Again, L(v) denotes the label of a node v, i.e., the concatenation of edge labels on the path from the root to v Key feature: L(i) = S[i... m] for each leaf i = 1,..., m of T BSA Lecture 6: Intro to suffix trees p.5/46

Example of a Suffix Tree The suffix tree for string 1 2 3 4 5 6 x a b x a c : xa bxac 1 c 4 a bxac c c 5 6 2 bxac 3 Does a suffix tree always exist? BSA Lecture 6: Intro to suffix trees p.6/46

Ensuring the Existence of a Suffix Tree Not necessarily: If some suffix w appears as a prefix of some other one, the path labeled by w does not lead to a leaf To avoid this, assume that S has a termination character $ that occurs only at the end of S (or insert one, if needed) no suffix appears as a prefix of any other suffixes label complete paths leading to the leaves BSA Lecture 6: Intro to suffix trees p.7/46

Applying Suffix Trees to Matching Suffix trees can be used to solve exact matching: 1. Construct the suffix tree T for text T [1... m]; (We ll discuss later how to do this in O(m) time) 2. Match characters of P along a path from the root (a) If P can be fully matched ( ), let z be the number of leaves below the path labeled by P. Each of these is a start of an occurrence of P, and they can be collected in time O(z) (Exercise) (b) If doesn t match completely, P doesn t occur in T Total time: O( T + P + z) ( ) P may end at the middle of an edge BSA Lecture 6: Intro to suffix trees p.8/46

Naive Construction of Suffix Trees Start with a root and a leaf numbered 1, connected by an edge labeled S$. Enter suffixes S[2... m]$, S[3... m]$,..., S[m]$ into the tree as follows: To insert K i = S[i... m]$, follow the path from the root matching characters of K i until the first mismatch at character K i [j] (which is bound to happen) (a) If the matching cannot continue from a node, denote that node by w (b) Otherwise the mismatch occurs at the middle of an edge, which has to be split BSA Lecture 6: Intro to suffix trees p.9/46

Naive Construction of Suffix Trees (2) If the mismatch occurs at the middle of an edge e = (u, v), let the label of that edge be a 1... a l If the mismatch occurred at character a k, then create a new node w, and replace e by edges (u, w) and (w, v) labeled by a 1... a k 1 and a k... a l Finally, in both cases (a) and (b), create a new leaf numbered i, and connect w to it by an edge labeled with K i [j... K i ] BSA Lecture 6: Intro to suffix trees p.10/46

Example of the Naive Construction Consider building the suffix tree for the string S: 1 2 3 4 5 6 x a b x a c Start: xabxac 1 After inserting the second and the third suffix: xabxac 1 abxac 2 bxac 3 BSA Lecture 6: Intro to suffix trees p.11/46

Example of the Naive Construction (2) Entering S[4... 6] = xac causes the first edge to split: 1 4 bxac xa c abxac bxac 2 3 Same happens for the second edge when S[5... 6] = ac is entered BSA Lecture 6: Intro to suffix trees p.12/46

Example of the Naive Construction (3) After entering suffixes S[5... 6] = ac and S[6] = c the suffix tree is complete: 1 bxac c xa 4 a bxac 2 c c 6 5 bxac 3 BSA Lecture 6: Intro to suffix trees p.13/46

Complexity of the Naive Construction Each suffix S[i... m]$ is entered in the tree in time Θ( S[i... m]$ ) total time is Θ( m+1 i=2 i) = Θ(m2 ) Observations: Number of edges in a suffix tree T is at most 2m 1 the size of T is O(m) (Exercise) On the other hand, the total length of edge labels can be Θ(m 2 ) (Exercise) As a simple example consider the suffix tree of abc... xyz, whose total length of edge labels is P 26 l=1 l = 26 27/2 For linear time we need a compact representation of edge labels BSA Lecture 6: Intro to suffix trees p.14/46

Compact Representation Each edge is labeled by a non-empty substring S[i... j] Compression: Represent label S[i... j] by two indices i and j to the string S each edge takes only constant space, and thus O(m) space suffices for the entire suffix tree of S[1... m] BSA Lecture 6: Intro to suffix trees p.15/46

Example of Compact Representation The suffix tree for a string with compacted edge labels 1 2 3 4 5 6 x a b x a c 3,6 1 1,2 6,6 4 2,2 3,6 2 6,6 6,6 5 3,6 6 3 BSA Lecture 6: Intro to suffix trees p.16/46

Short History of Suffix Trees Weiner, 1973: the first linear-time construction McCreight, 1976: a more space-efficient linear-time method Ukkonen, 1995: A simpler linear-time construction, with all advantages of the previous, and more memory-efficient in practice Next: Ukkonen s construction (which is still rather complex) We develop it starting from an intuitive but inefficient algorithm, and then improve it with reasonable implementation tricks BSA Lecture 6: Intro to suffix trees p.17/46

Ukkonen s Method Ukkonen s method constructs a suffix tree for S[1... m] in time O(m) We begin with a high-level description, and then refine the method to run in linear time The method builds, as intermediate results, for each prefix S[1... 1], S[1... 2],..., S[1... m] an implicit suffix tree The implicit suffix tree of a string is what results by applying suffix tree construction to the string without an added end marker $ all suffixes are included, but not necessarily as labels of complete root-to-leaf paths BSA Lecture 6: Intro to suffix trees p.18/46

Example of Implicit Suffix Trees (1) Suffix tree for xabxac$: bxac$ 1 4 xa c$ bxac$ 2 a c$ c$ 5 6 $ 7 bxac$ 3 Implicit suffix tree for xabxac: 1 bxac c xa 4 a bxac 2 c c 6 5 bxac 3 Obs 1: If the last char is unique, the implicit suffix tree is essentially the same as the (true) suffix tree (only without $ s) BSA Lecture 6: Intro to suffix trees p.19/46

Example of Implicit Suffix Trees (2) Suffix tree for xabxa$: 1 bxa$ xa $ 4 a bxa$ 2 $ $ 5 6 bxa$ 3 Implicit suffix tree for xabxa: 1 xabxa abxa 2 bxa 3 Obs 2: If the last char is not unique, some suffixes occur as labels of incomplete paths (not leading to a leaf, or even to an internal node) BSA Lecture 6: Intro to suffix trees p.20/46

Implicit Suffix Trees of Prefixes Denote the implicit suffix tree of the prefix S[1... i] by I i I 1 is just a single edge labeled by S[1] leading to leaf 1 Example: Implicit suffix trees for the first three prefixes of axabxb: I 1 : a 1 I 2 : I 3 : ax 1 x 2 axa 1 xa 2 BSA Lecture 6: Intro to suffix trees p.21/46

String Paths of I i I i contains each suffix S[1... i], S[2... i],..., S[i] of S[1... i] as a label of some path (possibly ending at the middle of an edge) Let s call such labels of (partial) paths (string) paths That is, a string path of I i is a string that can be matched along the edges, starting from the root a prefix of any node label any substring of S[1... i] BSA Lecture 6: Intro to suffix trees p.22/46

Ukkonen s Algorithm on a High Level Start with T I 1 Update T to trees I 2,..., I m+1 in m phases (vaihe) Let S[m + 1] be $ the final value of T is a true suffix tree, which contains all suffixes of S (extended with $) Phase i + 1 updates T from I i (with all suffixes of S[1... i]) to I i+1 (with all suffixes of S[1... i + 1]) Each phase i + 1 consists of extensions (lisäysaskel) j = 1,..., i + 1; Extension j ensures that suffix S[j... i + 1] is in I i+1 BSA Lecture 6: Intro to suffix trees p.23/46

Extension j of Phase i + 1 Phase i + 1 starts with T = I i Q: How to ensure that suffix S[j... i + 1] is in the tree? A: Extend the path S[j... i] of T by character S[i + 1] (if it isn t there already) Q: How to extend the path? BSA Lecture 6: Intro to suffix trees p.24/46

Suffix Extension Rules Extension rules for the three possible cases: Rule 1 If S[j... i] leads to a leaf (j), catenate S[i + 1] to its edge label Rule 2 If path S[j... i] ends before a leaf, and doesn t continue by S[i + 1]: Connect the end of the path to a new leaf j by an edge labeled by char S[i + 1]. (If the path ended at the middle of an edge, split the edge and insert a new node as the parent of leaf j) Rule 3 If the path continues by S[i + 1], do nothing. (Suffix S[j... i + 1] is already in the tree) Let s call an extension that applies Rule 3 void (and applications of Rules 1 and 2 non-void) BSA Lecture 6: Intro to suffix trees p.25/46

Example of Extensions Consider phase 6 with string S = axabxb; Now T = I 5 contains all suffixes of S[1... 5] = axabx: xabx a bx xabx 2 bx 4 1 3 Extensions 1 4: S[1... 6] = axabxb,..., S[4... 6] = bxb entered by Rule 1 Extension 5: S[5... 6] = xb entered by Rule 2, creating leaf 5 and its parent Extension 6: S[6... 6] = b entered by Rule 3 T = I 6 : xabxb a x abxb bxb 2 b bxb 5 4 1 3 BSA Lecture 6: Intro to suffix trees p.26/46

Complexity of a Naive Implementation Consider first a single phase i + 1 Each of the extension rules can be applied in constant time for extensions j = 1,..., i + 1 they take total time θ(i) Traversing the paths S[1... i],... S[i + 1... i] explicitly takes time Θ( i l=0 l) = Θ(i2 ) Total time for all phases i = 2,..., m + 1 is m+1 Θ( Q: How to improve this? i=2 i 2 ) = Θ(m 3 ) BSA Lecture 6: Intro to suffix trees p.27/46

Reducing the Complexity To get total time down to O(m 2 ) we need to avoid or speed up path traversals Even constant time per extension requires time m+1 Θ( i=2 i) = Θ(m 2 ) To get total time down to O(m) we also need to avoid performing some extensions at all A number of observations and tricks are needed for this Consider first speeding up the path traversals BSA Lecture 6: Intro to suffix trees p.28/46

Locating Ends of Paths The extensions of phase i + 1 need to locate the ends of all the i + 1 suffixes of S[1... i] How to do this efficiently? For each internal node v of T labeled by xα, where x Σ and α Σ, define s(v) to be the node labeled by α We ll show that these exist, in a moment Then a pointer from v to s(v) is the suffix link of v NB: If node v is labeled by a single char (x), then α = ɛ and s(v) is the root BSA Lecture 6: Intro to suffix trees p.29/46

Example of Suffix Links xa a c bxac 6 3 bxac c bxac c 1 4 2 5 What are suffix links good for? BSA Lecture 6: Intro to suffix trees p.30/46

Intuitive Motivation for Suffix Links Extension j (of phase i + 1) finds the end of the path S[j... i] in the tree (and extends it with char S[i + 1]) Extension j + 1 finds the end of the path S[j + 1... i] Assume that v is an internal node labeled by S[j]α on the path S[j... i]. Then we can avoid traversing path α when locating the end of S[j + 1... i], by starting from node s(v) Q: Do suffix links always exist? A: Yes, and each suffix link (v, s(v)) is easy to set: BSA Lecture 6: Intro to suffix trees p.31/46

Computation of Suffix Links Observation: If an internal node v is created during extension j (of phase i + 1), then the next extension j + 1 will find out the node s(v) Why? Let v be labeled by xα Now xα = S[j..i], and node v is created by Rule 2 That is, v is inserted at the end of path S[j... i], which continued by some character c S[i + 1] paths xαc and αc were entered earlier in extension j + 1, node s(v) is either found or created at the end of path α = S[j + 1... i] BSA Lecture 6: Intro to suffix trees p.32/46

Speeding up Path Traversals Consider extensions of phase i + 1 Extension 1 extends path S[1... i] with char S[i + 1] Path S[1... i] always ends at leaf 1, and is thus extended by Rule 1 Extension 1 can be performed in constant time, by maintaining a pointer to the incoming edge of leaf 1 What about subsequent extensions j + 1 (for j = 1,..., i)? BSA Lecture 6: Intro to suffix trees p.33/46

Locating Subsequent Paths Extension j has located the end of the path S[j... i] Starting from there, walk up at most one node either (a) to the root, or (b) to a node v with a suffix link In case (a), traverse path S[j + 1... i] explicitly down-wards from the root BSA Lecture 6: Intro to suffix trees p.34/46

Short-cutting Traversals In case (b), let xα be the label of v S[j... i] = xαβ for some β Σ (Draw a picture of the paths!) Then follow the suffix link of v, and continue by matching β down-wards from node s(v) (which is now labeled by α) Having found the end of path αβ = S[j + 1... i], apply extension rules to ensure that it continues with S[i + 1] Finally, if a new internal node w was created in extension j, set its suffix link to point to the end node of path S[j + 1... i] BSA Lecture 6: Intro to suffix trees p.35/46

Speeding up Explicit Traversals (Trick 1 (skip/count) in Gusfield) The path S[j... i] followed in extension j is in the tree ( it is a suffix of S[1..i]) no need to check all of its characters; it suffices to choose the correct edges Let S[k] be the next char to be matched on path S[j... i] An edge labeled by S[p... q] can be traversed simply by checking that S[p] = S[k], and skipping the next q p chars of S[j... i] time to traverse a path is proportional to the node-length on the path (instead of its string-length) BSA Lecture 6: Intro to suffix trees p.36/46

Bounding the Time of Tree Traversals Lemma 6.1.2 For any node v with a suffix link to s(v), depth(s(v)) depth(v) 1 That is, following a suffix link leads at most one level closer to the root Proof. (Idea) The suffix links for any ancestor of v lead to distinct ancestors of s(v) Now we can argue a linear time bound for any phase by considering how the current node depth can change (= the depth of the most recently visited node) BSA Lecture 6: Intro to suffix trees p.37/46

Linear Bound for any Single Phase Theorem 6.1.1 Using suffix links and the skip/count trick, a single phase takes time O(m) Proof There are i + 1 m + 1 extensions in phase i + 1 In any extension, other work except tree-traversals takes constant time only How to bound the work for traversing the tree? To find the end of the next path, an extension first moves at most one level up. Then a suffix link may be followed, which is followed by a down traversal to match the rest of the path BSA Lecture 6: Intro to suffix trees p.38/46

Bounding the Edge Traversals The possible up movement and suffix link traversal decrement current node depth at most twice the current node depth is decremented at most 2m times during the entire phase On the other hand, the current node depth cannot exceed m it is incremented (by following downward edges) at most 3m times total time of a phase is O(m) NB: Since there are m phases, total time is O(m 2 ) A few more tricks are needed to get total time linear BSA Lecture 6: Intro to suffix trees p.39/46

Final Improvements Some extensions can be found unnecessary to compute explicitly Obs 1: Rule 3 is a show-stopper : If path S[j... i + 1] is already in the tree, so are paths S[j + 1... i + 1],..., S[i + 1], too Phase i + 1 can be finished at the first extension j that applies Rule 3; all the rest are void, too BSA Lecture 6: Intro to suffix trees p.40/46

Final Improvements (2) Obs 2: A node created as a leaf stays as a leaf because no extension rule adds children to a leaf If extension j created a leaf (numbered j), extension j of any later phase i + 1 applies Rule 1 (appending the next char S[i + 1] to the edge label of j) Explicit applications of Rule 1 can be eliminated as follows: Use compressed edge representation (i.e., indices p and q instead of substring S[p... q]), and represent the end position of each terminal edge by a global value e for the current end position BSA Lecture 6: Intro to suffix trees p.41/46

Eliminating Extensions Denote by j i the last non-void extension of phase i (that is, application of Rule 1 or 2); j 1 = 1 Obs 1 extensions 1,..., j i of phase i are non-void leaves 1,... j i have been created at the end of phase i Obs 2 extensions 1,... j i of any subsequent phase all apply Rule 1 they are non-void in phase i + 1, too j i+1 j i Sufficient to execute explicitely, in phase i + 1, extensions j i + 1, j i + 2,... only BSA Lecture 6: Intro to suffix trees p.42/46

Single Phase Algorithm Algorithm for phase i + 1 with unnecessary extensions eliminated 1. Set current end-position: e := i + 1; (to implement extensions 1,... j i implicitly) 2. Compute extensions j i + 1,..., j until j > i + 1 or Rule 3 was applied in extension j ; 3. Set j i+1 := j 1; (for the next phase) All these tricks together can be shown to lead to linear time BSA Lecture 6: Intro to suffix trees p.43/46

Analysis of the Tuned Implementation Theorem 6.1.2 Ukkonen s algorithm builds the suffix tree for S[1... m] in time O(m), when implemented using the above tricks Proof The extensions computed explicitly in any two phases i and i + 1 are disjoint, except for extension j, which may be repeated in phase i + 1 The repeat of extension j can be performed in constant time, by remembering the end of the path entered in its previous computation BSA Lecture 6: Intro to suffix trees p.44/46

Analysis of the Tuned Implementation (2) Let j = 1,..., m + 1 denote the index of the current extension Over all phases 2,..., m + 1 index j never decreases, but it can remain the same at the start of phases 3,..., m + 1 at most 2m extensions are computed explicitly Similarly to the proof of Th. 6.1.1, the current node depth can be decremented at most 4m times, and thus the total length of all downward traversals is bounded by 5m BSA Lecture 6: Intro to suffix trees p.45/46

A Final Touch Finally, I m+1 can be converted to the true suffix tree of S[1... m]$ as follows: All occurrences of the current end position marker e on edge labels can be replaced by m + 1 (with a simple tree traversal, in time O(m)) Remark: Ukkonen s method is on-line, by processing S left-to-right and having a suffix tree ready for the scanned part BSA Lecture 6: Intro to suffix trees p.46/46