Suffix trees and applications. String Algorithms
|
|
- Isaac Nichols
- 5 years ago
- Views:
Transcription
1 Suffix trees and applications String Algorithms
2 Tries a trie is a data structure for storing and retrieval of strings.
3 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c
4 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b a x 2 = a b c
5 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c a b 1
6 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c a b 1 a
7 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c a b 1 a b
8 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c a b 1 2 a b c
9 Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x 2 = a b c a b 1 2 a b c Observations: shared prefixes implies shared initial paths Often we want each string to correspond to a unique root-to-leaf path, i.e. make sure that no input-string is a prefix of another. How?
10 Tries a trie is a data structure for storing and retrieval of strings. a a x 1 = a b b b x 1 = a b $ x 2 = a b c 1 c x 2 = a b c $ 2 x 3 = $ Observations: shared prefixes implies shared initial paths Often we want each string to correspond to a unique root-to-leaf path, i.e. make sure that no input-string is a prefix of another. How?
11 Tries a trie is a data structure for storing and retrieval of strings. a a a x 1 = a b x 2 = a b c b 1 2 b c x 1 = a b $ x 2 = a b c $ x 3 = $ b $ 1 Observations: shared prefixes implies shared initial paths Often we want each string to correspond to a unique root-to-leaf path, i.e. make sure that no input-string is a prefix of another. How?
12 Tries a trie is a data structure for storing and retrieval of strings. a a a a x 1 = a b x 2 = a b c b 1 2 b c x 1 = a b $ x 2 = a b c $ x 3 = $ b b c $ $ 1 2 Observations: shared prefixes implies shared initial paths Often we want each string to correspond to a unique root-to-leaf path, i.e. make sure that no input-string is a prefix of another. How?
13 Tries a trie is a data structure for storing and retrieval of strings. a a a a $ x 1 = a b x 1 = a b $ b b b b x 2 = a b c 1 x 2 = a b c $ c c $ $ 2 x 3 = $ Observations: shared prefixes implies shared initial paths Often we want each string to correspond to a unique root-to-leaf path, i.e. make sure that no input-string is a prefix of another. How?
14 Tries a trie is a data structure for storing and retrieval of strings. a a a a $ x 1 = a b x 1 = a b $ b b b b x 2 = a b c 1 x 2 = a b c $ c c $ $ 2 x 3 = $ Observations: shared prefixes implies shared initial paths Application: Given a query-string y[1...m], we can determine if y equals one of the input-strings (or a prefix of one) in time O(m)...
15 What is the space complexity?
16 Compacted tries Saving space: Eliminate all internal nodes of degree 2... x 1 = a b $ x 2 = a b c $ x 3 = $ a b a b $ c $ $ a b $ 1 $ c $ 3 2 If we have n input-strings, then the trie has n+1 leaves and at most n internal nodes, i.e space O(n) for the tree. What about the labels?
17 Compacted tries Saving space: Eliminate all internal nodes of degree 2... x 1 = a b $ x 2 = a b c $ x 3 = $ a b a b $ c $ $ a b $ 1 $ c $ 3 2 If we have n input-strings, then the trie has n+1 leaves and at most n internal nodes, i.e space O(n) for the tree. What about the labels? Labels can be represented in space O(1), i.e. ab (1,1,2)
18 If there are n strings... How many leaves are there? How many inner nodes? How many edges?
19 Suffix tree The suffix tree T(x) of string x[1..n] is the compacted trie of all suffixes x[i..n] for i = 1,.., n+1, i.e. including the empty suffix
20 Suffix tree The suffix tree T(x) of string x[1..n] is the compacted trie of all suffixes x[i..n] for i = 1,..,n+1, i.e. including the empty suffix Example for x = tatat tatat$ atat$ tat$ at$ t$ ε$
21 Suffix tree The suffix tree T(x) of string x[1..n] is the compacted trie of all suffixes x[i..n] for i = 1,..,n+1, i.e. including the empty suffix Example for x = tatat tatat$ atat$ tat$ at$ t$ ε$
22 A larger example
23 A larger example Node has path-label ssi and is at depth 3...
24 A larger example Node has path-label ssi and is at depth 3... Path-label of leaf i is suffix i, i.e. x[i..n]$...
25 A larger example Path-label of lowest common ancestor of leaf i and j, is longest common prefix of suffix i and j of x... Node has path-label ssi and is at depth 3... Path-label of leaf i is suffix i, i.e. x[i..n]$...
26 What is the space complexity?
27 Space consumption
28 Space consumption
29 Space consumption
30 Space consumption
31 Constructing suffix trees Constructing T(x) by inserting each suffix one by one takes time O(n 2 ) Can we do better?
32 Constructing suffix trees Constructing T(x) by inserting each suffix one by one takes time O(n 2 ) Can we do better? [Weiner 1973]: T(x) can be constructed in time O(n)... There are two practical algorithms that construct the suffix tree in linear time: McCreight (1976) and Ukkonen (1993)...
33 What about applications?... exact matching, finding repeats, longest common substring...
34 Exact matching Given string x and pattern y, report where y occurs in x If y occurs in x at position i, then y is a prefix of suffix i of x i j y is spelled by an initial part of the path from the root to leaf i in T(x)
35 Exact matching Given string x and pattern y, report where y occurs in x
36 Exact matching Given string x and pattern y, report where y occurs in x
37 Exact matching Given string x and pattern y, report where y occurs in x
38 Exact matching Given string x and pattern y, report where y occurs in x
39 Exact matching Given string x and pattern y, report where y occurs in x Pattern ata occurs at position 2 in tatat Time: O( P ) using the suffix tree T(S)
40 Exact matching Given string x and pattern y, report where y occurs in x
41 Exact matching Given string x and pattern y, report where y occurs in x
42 Exact matching Given string x and pattern y, report where y occurs in x
43 Exact matching Given string x and pattern y, report where y occurs in x
44 Exact matching Given string x and pattern y, report where y occurs in x Pattern tatt does not occur in tatat Time: O( P ) using the suffix tree T(S)
45 Repeats A pair of substrings R=(S[i 1..j 1 ], S[i 2..j 2 ]) is a...
46 Repeats A pair of substrings R=(S[i 1..j 1 ], S[i 2..j 2 ]) is a...
47 Repeats A pair of substrings R=(S[i 1..j 1 ], S[i 2..j 2 ]) is a...
48 Finding exact repeats
49 Finding exact repeats
50 Finding exact repeats
51 Finding exact repeats
52 Finding exact repeats
53 Finding exact repeats
54 Finding exact repeats
55 A larger example
56 Finding maximal exact repeats
57 Finding maximal exact repeats
58 Finding maximal exact repeats
59 Finding maximal exact repeats
60 Finding maximal exact repeats
61 Other repeats Maximal repeats with bounded gap in time O(n log n + z) Tandem repeats in time O(n log n + z) Palindromic repeats in O(n + z)... all using suffix trees...
62 More strings The longest common substring of x[1..n] and y[1..m] is the longest string z which occurs in both x and y... Can this be found efficiently using a suffix tree?
63 More strings The longest common substring of x[1..n] and y[1..m] is the longest string z which occurs in both x and y... Can this be found efficiently using a suffix tree? z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] x z y
64 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε
65 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 a a t a
66 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a a t a
67 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a a t a a 3
68 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a a 3
69 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a 5 a 3
70 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] Idea: Build a compacted trie of all suffixes of x and y, such that each suffix of x and y corresponds to unique root-to-leaf paths... tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a 5 a 3 6
71 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a 5 a 3 6 Observe: z is the path-label of the deepest node with suffixes from both x and y as leaves in its sub-tree...
72 More strings z is the longest common prefix of any pair of suffixes x[i..n] and y[j..m] tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a 5 a 3 6 Observe: z is the path-label of the deepest node with suffixes from both x and y as leaves in its sub-tree... Time: O(n+m)
73 Generalized suffix tree This is the generalized suffix tree of tatat and aataa tatat$ atat$ tat$ at$ t$ ε$ aataa ataa taa aa a ε 1 2 a a 4 a t a 5 a 3 6 Can be constructed by constructing the suffix tree of... tatat$aataa
74 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m... we must argue that we get the same branching structure...
75 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m Case 1: i j $ $
76 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m Case 1: i j $ $ i j $ $ $ $ j i
77 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m Case 1: i j $ $ i j $ $ $ i $ j $ (1, j) $ (1, i)
78 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m Case 2: i j i +(n+1) j +(n+1) j i (2, j ) (2, i )
79 Generalized suffix tree 1 n n+2 n+m+1 $ 1 m Case 3: i j $ i j +(n+1) $ j $ (2, j ) $ (1, i) i
80 Is everything great?
81 Space consumption Fact: T(x) requires O(n) space, where n= x... but how much space does it consume in practice?
82
83 Space consumption Fact: T(x) requires O(n) space, where n= x, but in practice somewhere between 10 and 40 bytes per letter in x... Is this a problem? Depends on n, if then yes...
84 Alphabet size How much time does it take to find the proper edge out from a node when searching in a suffix tree?
85 Alphabet size How much time does it take to find the proper edge out from a node when searching in a suffix tree? Time proportional to the out-degree of the node A search time in pratice is O( A P )... If A is large, e.g. 256, this matters!!
86 Alphabet size How much time does it take to find the proper edge out from a node when searching in a suffix tree? Idea 1: Organising children in a search-tree, reduces search time from A to O(log A )... (requires an ordered alphabet)
87 Alphabet size How much time does it take to find the proper edge out from a node when searching in a suffix tree? Idea 2: Organising children in a vector of size A indexed by letters, reduces search time from A to O(1)... (requires a finite alphabet)
88 Alphabet size How much time does it take to find the proper edge out from a node when searching in a suffix tree? Idea 3: Use some other dictionary for mapping letters to children the alphabet size matters in practice...
Data structures for string pattern matching: Suffix trees
Suffix trees Data structures for string pattern matching: Suffix trees Linear algorithms for exact string matching KMP Z-value algorithm What is suffix tree? A tree-like data structure for solving problems
More informationAn introduction to suffix trees and indexing
An introduction to suffix trees and indexing Tomáš Flouri Solon P. Pissis Heidelberg Institute for Theoretical Studies December 3, 2012 1 Introduction Introduction 2 Basic Definitions Graph theory Alphabet
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationLecture 5: Suffix Trees
Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationUkkonen s suffix tree algorithm
Ukkonen s suffix tree algorithm Recall McCreight s approach: For i = 1.. n+1, build compressed trie of {x[..n]$ i} Ukkonen s approach: For i = 1.. n+1, build compressed trie of {$ i} Compressed trie of
More informationLecture 18 April 12, 2005
6.897: Advanced Data Structures Spring 5 Prof. Erik Demaine Lecture 8 April, 5 Scribe: Igor Ganichev Overview In this lecture we are starting a sequence of lectures about string data structures. Today
More informationSuffix-based text indices, construction algorithms, and applications.
Suffix-based text indices, construction algorithms, and applications. F. Franek Computing and Software McMaster University Hamilton, Ontario 2nd CanaDAM Conference Centre de recherches mathématiques in
More informationSuffix links are stored for compact trie nodes only, but we can define and compute them for any position represented by a pair (u, d):
Suffix links are the same as Aho Corasick failure links but Lemma 4.4 ensures that depth(slink(u)) = depth(u) 1. This is not the case for an arbitrary trie or a compact trie. Suffix links are stored for
More informationLecture 7 February 26, 2010
6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some
More information1 Introduciton. 2 Tries /651: Algorithms CMU, Spring Lecturer: Danny Sleator
15-451/651: Algorithms CMU, Spring 2015 Lecture #25: Suffix Trees April 22, 2015 (Earth Day) Lecturer: Danny Sleator Outline: Suffix Trees definition properties (i.e. O(n) space) applications Suffix Arrays
More informationDistributed and Paged Suffix Trees for Large Genetic Databases p.1/18
istributed and Paged Suffix Trees for Large Genetic atabases Raphaël Clifford and Marek Sergot raphael@clifford.net, m.sergot@ic.ac.uk Imperial College London, UK istributed and Paged Suffix Trees for
More informationExact String Matching Part II. Suffix Trees See Gusfield, Chapter 5
Exact String Matching Part II Suffix Trees See Gusfield, Chapter 5 Outline for Today What are suffix trees Application to exact matching Building a suffix tree in linear time, part I: Ukkonen s algorithm
More information17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.
17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications
More informationPAPER Constructing the Suffix Tree of a Tree with a Large Alphabet
IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is
More informationAn Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario
An Efficient Algorithm for Identifying the Most Contributory Substring Ben Stephenson Department of Computer Science University of Western Ontario Problem Definition Related Problems Applications Algorithm
More informationLecture 6: Suffix Trees and Their Construction
Biosequence Algorithms, Spring 2007 Lecture 6: Suffix Trees and Their Construction Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 6: Intro to suffix trees p.1/46 II:
More informationString Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42
String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt
More informationSuffix Trees and its Construction
Chapter 5 Suffix Trees and its Construction 5.1 Introduction to Suffix Trees Sometimes fundamental techniques do not make it into the mainstream of computer science education in spite of its importance,
More informationCombinatorial Pattern Matching. CS 466 Saurabh Sinha
Combinatorial Pattern Matching CS 466 Saurabh Sinha Genomic Repeats Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary
More informationExact Matching Part III: Ukkonen s Algorithm. See Gusfield, Chapter 5 Visualizations from
Exact Matching Part III: Ukkonen s Algorithm See Gusfield, Chapter 5 Visualizations from http://brenden.github.io/ukkonen-animation/ Goals for Today Understand how suffix links are used in Ukkonen's algorithm
More informationFriday Four Square! 4:15PM, Outside Gates
Binary Search Trees Friday Four Square! 4:15PM, Outside Gates Implementing Set On Monday and Wednesday, we saw how to implement the Map and Lexicon, respectively. Let's now turn our attention to the Set.
More information11/5/13 Comp 555 Fall
11/5/13 Comp 555 Fall 2013 1 Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Phenotypes arise from copy-number variations Genomic rearrangements are often associated with repeats Trace
More information58093 String Processing Algorithms. Lectures, Autumn 2013, period II
58093 String Processing Algorithms Lectures, Autumn 2013, period II Juha Kärkkäinen 1 Contents 0. Introduction 1. Sets of strings Search trees, string sorting, binary search 2. Exact string matching Finding
More informationApplications of Suffix Tree
Applications of Suffix Tree Let us have a glimpse of the numerous applications of suffix trees. Exact String Matching As already mentioned earlier, given the suffix tree of the text, all occ occurrences
More informationBUNDLED SUFFIX TREES
Motivation BUNDLED SUFFIX TREES Luca Bortolussi 1 Francesco Fabris 2 Alberto Policriti 1 1 Department of Mathematics and Computer Science University of Udine 2 Department of Mathematics and Computer Science
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationSuffix Trees and Arrays
Suffix Trees and Arrays Yufei Tao KAIST May 1, 2013 We will discuss the following substring matching problem: Problem (Substring Matching) Let σ be a single string of n characters. Given a query string
More informationInexact Matching, Alignment. See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming)
Inexact Matching, Alignment See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming) Outline Yet more applications of generalized suffix trees, when combined with a least common ancestor
More information11/5/09 Comp 590/Comp Fall
11/5/09 Comp 590/Comp 790-90 Fall 2009 1 Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary secrets Many tumors
More informationCOMP4128 Programming Challenges
Multi- COMP4128 Programming Challenges School of Computer Science and Engineering UNSW Australia Table of Contents 2 Multi- 1 2 Multi- 3 3 Multi- Given two strings, a text T and a pattern P, find the first
More informationmarc skodborg, simon fischer,
E F F I C I E N T I M P L E M E N TAT I O N S O F S U F - F I X T R E E S marc skodborg, 201206073 simon fischer, 201206049 master s thesis June 2017 Advisor: Christian Nørgaard Storm Pedersen AARHUS AU
More informationLecture L16 April 19, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture L16 April 19, 2012 1 Overview In this lecture, we consider the string matching problem - finding some or all places in a text where
More informationSuffix Tree and Array
Suffix Tree and rray 1 Things To Study So far we learned how to find approximate matches the alignments. nd they are difficult. Finding exact matches are much easier. Suffix tree and array are two data
More informationGiven a text file, or several text files, how do we search for a query string?
CS 840 Fall 2016 Text Search and Succinct Data Structures: Unit 4 Given a text file, or several text files, how do we search for a query string? Note the query/pattern is not of fixed length, unlike key
More information1.3 Functions and Equivalence Relations 1.4 Languages
CSC4510 AUTOMATA 1.3 Functions and Equivalence Relations 1.4 Languages Functions and Equivalence Relations f : A B means that f is a function from A to B To each element of A, one element of B is assigned
More informationSuffix Trees. Martin Farach-Colton Rutgers University & Tokutek, Inc
Suffix Trees Martin Farach-Colton Rutgers University & Tokutek, Inc What s in this talk? What s a suffix tree? What can you do with them? How do you build them? A challenge problem What s in this talk?
More informationSolution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.
Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree. The basic approach is the same as when using a suffix tree,
More informationComputing the Longest Common Substring with One Mismatch 1
ISSN 0032-9460, Problems of Information Transmission, 2011, Vol. 47, No. 1, pp. 1??. c Pleiades Publishing, Inc., 2011. Original Russian Text c M.A. Babenko, T.A. Starikovskaya, 2011, published in Problemy
More informationLongest Common Substring
2012 Longest Common Substring CS:255 Design and Analysis of Algorithms Computer Science Department San Jose State University San Jose, CA-95192 Project Guide: Dr Sami Khuri Team Members: Avinash Anantharamu
More informationReporting Consecutive Substring Occurrences Under Bounded Gap Constraints
Reporting Consecutive Substring Occurrences Under Bounded Gap Constraints Gonzalo Navarro University of Chile, Chile gnavarro@dcc.uchile.cl Sharma V. Thankachan Georgia Institute of Technology, USA sthankachan@gatech.edu
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 4: Suffix trees Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationCSCI 104 Tries. Mark Redekopp David Kempe
1 CSCI 104 Tries Mark Redekopp David Kempe TRIES 2 3 Review of Set/Map Again Recall the operations a set or map performs Insert(key) Remove(key) find(key) : bool/iterator/pointer Get(key) : value [Map
More informationBinary Trees, Binary Search Trees
Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)
More informationCSE 417 Dynamic Programming (pt 5) Multiple Inputs
CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions
More informationEfficient Implementation of Suffix Trees
SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(2), 129 141 (FEBRUARY 1995) Efficient Implementation of Suffix Trees ARNE ANDERSSON AND STEFAN NILSSON Department of Computer Science, Lund University, Box 118,
More informationCombinatorial Pattern Matching
Combinatorial Pattern Matching Outline Exact Pattern Matching Keyword Trees Suffix Trees Approximate String Matching Local alignment is to slow Quadratic local alignment is too slow while looking for similarities
More informationApplications of Succinct Dynamic Compact Tries to Some String Problems
Applications of Succinct Dynamic Compact Tries to Some String Problems Takuya Takagi 1, Takashi Uemura 2, Shunsuke Inenaga 3, Kunihiko Sadakane 4, and Hiroki Arimura 1 1 IST & School of Engineering, Hokkaido
More informationSuffix Vector: A Space-Efficient Suffix Tree Representation
Lecture Notes in Computer Science 1 Suffix Vector: A Space-Efficient Suffix Tree Representation Krisztián Monostori 1, Arkady Zaslavsky 1, and István Vajk 2 1 School of Computer Science and Software Engineering,
More informationFigure 1. The Suffix Trie Representing "BANANAS".
The problem Fast String Searching With Suffix Trees: Tutorial by Mark Nelson http://marknelson.us/1996/08/01/suffix-trees/ Matching string sequences is a problem that computer programmers face on a regular
More informationVerifying a Border Array in Linear Time
Verifying a Border Array in Linear Time František Franěk Weilin Lu P. J. Ryan W. F. Smyth Yu Sun Lu Yang Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario
More informationCSED233: Data Structures (2017F) Lecture12: Strings and Dynamic Programming
(2017F) Lecture12: Strings and Dynamic Programming Daijin Kim CSE, POSTECH dkim@postech.ac.kr Strings A string is a sequence of characters Examples of strings: Python program HTML document DNA sequence
More informationUniversity of Waterloo CS240R Fall 2017 Solutions to Review Problems
University of Waterloo CS240R Fall 2017 Solutions to Review Problems Reminder: Final on Tuesday, December 12 2017 Note: This is a sample of problems designed to help prepare for the final exam. These problems
More informationUses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010
Uses for About Binary January 31, 2010 Uses for About Binary Uses for Uses for About Basic Idea Implementing Binary Example: Expression Binary Search Uses for Uses for About Binary Uses for Storage Binary
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationAlgorithms and Data S tructures Structures Optimal S earc Sear h ch Tr T ees; r T ries Tries Ulf Leser
Algorithms and Data Structures Optimal Search Trees; Tries Ulf Leser Static Key Sets Sometimes, the setofkeysisfixed fixed Names in a city, customers of many companies, elements of a programming language,
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:
More informationIntroduction to Suffix Trees
Algorithms on Strings, Trees, and Sequences Dan Gusfield University of California, Davis Cambridge University Press 1997 Introduction to Suffix Trees A suffix tree is a data structure that exposes the
More informationA Suffix Tree Construction Algorithm for DNA Sequences
A Suffix Tree Construction Algorithm for DNA Sequences Hongwei Huo School of Computer Science and Technol Xidian University Xi 'an 710071, China Vojislav Stojkovic Computer Science Department Morgan State
More informationProject Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio
Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade
More informationIndexing Variable Length Substrings for Exact and Approximate Matching
Indexing Variable Length Substrings for Exact and Approximate Matching Gonzalo Navarro 1, and Leena Salmela 2 1 Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl 2 Department of
More informationTrees. CSE 373 Data Structures
Trees CSE 373 Data Structures Readings Reading Chapter 7 Trees 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical relationships File directories
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationPredecessor. Predecessor Problem van Emde Boas Tries. Philip Bille
Predecessor Predecessor Problem van Emde Boas Tries Philip Bille Predecessor Predecessor Problem van Emde Boas Tries Predecessors Predecessor problem. Maintain a set S U = {,..., u-} supporting predecessor(x):
More informationDivya R. Singh. Faster Sequence Alignment using Suffix Tree and Data-Mining Techniques. February A Thesis Presented by
Faster Sequence Alignment using Suffix Tree and Data-Mining Techniques A Thesis Presented by Divya R. Singh to The Faculty of the Graduate College of the University of Vermont In Partial Fulfillment of
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:
More informationSORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???
SORTING + STRING COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges book.
More informationx-fast and y-fast Tries
x-fast and y-fast Tries Problem Problem Set Set 7 due due in in the the box box up up front. front. That's That's the the last last problem problem set set of of the the quarter! quarter! Outline for Today
More informationCMPUT 403: Strings. Zachary Friggstad. March 11, 2016
CMPUT 403: Strings Zachary Friggstad March 11, 2016 Outline Tries Suffix Arrays Knuth-Morris-Pratt Pattern Matching Tries Given a dictionary D of strings and a query string s, determine if s is in D. Using
More informationData Structures and Algorithms(12)
Ming Zhang "Data s and Algorithms" Data s and Algorithms(12) Instructor: Ming Zhang Textbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao Higher Education Press, 28.6 (the "Eleventh Five-Year" national
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationData Structures and Methods. Johan Bollen Old Dominion University Department of Computer Science
Data Structures and Methods Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen January 20, 2004 Page 1 Lecture Objectives 1. To this point:
More informationPredecessor. Predecessor. Predecessors. Predecessors. Predecessor Problem van Emde Boas Tries. Predecessor Problem van Emde Boas Tries.
Philip Bille s problem. Maintain a set S U = {,..., u-} supporting predecessor(x): return the largest element in S that is x. sucessor(x): return the smallest element in S that is x. insert(x): set S =
More informationSpace Efficient Linear Time Construction of
Space Efficient Linear Time Construction of Suffix Arrays Pang Ko and Srinivas Aluru Dept. of Electrical and Computer Engineering 1 Laurence H. Baker Center for Bioinformatics and Biological Statistics
More informationString Patterns and Algorithms on Strings
String Patterns and Algorithms on Strings Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 11 Objectives To introduce the pattern matching problem and the important of algorithms
More informationSyllabus. 5. String Problems. strings recap
Introduction to Algorithms Syllabus Recap on Strings Pattern Matching: Knuth-Morris-Pratt Longest Common Substring Edit Distance Context-free Parsing: Cocke-Younger-Kasami Huffman Compression strings recap
More informationLowest Common Ancestor (LCA) Queries
Lowest Common Ancestor (LCA) Queries A technique with application to approximate matching Chris Lewis Approximate Matching Match pattern to text Insertion/Deletion/Substitution Applications Bioinformatics,
More informationUniversity of Waterloo CS240 Spring 2018 Help Session Problems
University of Waterloo CS240 Spring 2018 Help Session Problems Reminder: Final on Wednesday, August 1 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems
More informationA Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises
308-420A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises Section 1.2 4, Logarithmic Files Logarithmic Files 1. A B-tree of height 6 contains 170,000 nodes with an
More informationAlphabet-Dependent String Searching with Wexponential Search Trees
Alphabet-Dependent String Searching with Wexponential Search Trees Johannes Fischer and Pawe l Gawrychowski February 15, 2013 arxiv:1302.3347v1 [cs.ds] 14 Feb 2013 Abstract It is widely assumed that O(m
More informationK-structure, Separating Chain, Gap Tree, and Layered DAG
K-structure, Separating Chain, Gap Tree, and Layered DAG Presented by Dave Tahmoush Overview Improvement on Gap Tree and K-structure Faster point location Encompasses Separating Chain Better storage Designed
More informationMapping a Dynamic Prefix Tree on a P2P Network
Mapping a Dynamic Prefix Tree on a P2P Network Eddy Caron, Frédéric Desprez, Cédric Tedeschi GRAAL WG - October 26, 2006 Outline 1 Introduction 2 Related Work 3 DLPT architecture 4 Mapping 5 Conclusion
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationAlgorithms and Data Structures
Algorithms and Data Structures Optimal Search Trees; Tries Ulf Leser Content of this Lecture Optimal Search Trees Definition Construction Analysis Searching Strings: Tries Ulf Leser: Algorithms and Data
More informationMITOCW watch?v=ninwepprkdq
MITOCW watch?v=ninwepprkdq The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To
More informationLinearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays
Algorithmica (2008) 52: 350 377 DOI 10.1007/s00453-007-9061-2 Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays Dong Kyue Kim Minhwan Kim
More informationSuffix Arrays Slides by Carl Kingsford
Suffix Arrays 02-714 Slides by Carl Kingsford Suffix Arrays Even though Suffix Trees are O(n) space, the constant hidden by the big-oh notation is somewhat big : 20 bytes / character in good implementations.
More informationTRIE BASED METHODS FOR STRING SIMILARTIY JOINS
TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH
More informationL02 : 08/21/2015 L03 : 08/24/2015.
L02 : 08/21/2015 http://www.csee.wvu.edu/~adjeroh/classes/cs493n/ Multimedia use to be the idea of Big Data Definition of Big Data is moving data (It will be different in 5..10 yrs). Big data is highly
More informationCSE 5095 Topics in Big Data Analytics Spring 2014; Homework 1 Solutions
CSE 5095 Topics in Big Data Analytics Spring 2014; Homework 1 Solutions Note: Solutions to problems 4, 5, and 6 are due to Marius Nicolae. 1. Consider the following algorithm: for i := 1 to α n log e n
More informationFinal Exam Solutions
COS 226 FINAL SOLUTIONS, FALL 214 1 COS 226 Algorithms and Data Structures Fall 214 Final Exam Solutions 1. Digraph traversal. (a) 8 5 6 1 4 2 3 (b) 4 2 3 1 5 6 8 2. Analysis of algorithms. (a) N (b) log
More informationCMSC423: Bioinformatic Algorithms, Databases and Tools. Exact string matching: Suffix trees Suffix arrays
CMSC423: Bioinformatic Algorithms, Databases and Tools Exact string matching: Suffix trees Suffix arrays Searching multiple strings Can we search multiple strings at the same time? Would it help if we
More informationExact Matching: Hash-tables and Automata
18.417 Introduction to Computational Molecular Biology Lecture 10: October 12, 2004 Scribe: Lele Yu Lecturer: Ross Lippert Editor: Mark Halsey Exact Matching: Hash-tables and Automata While edit distances
More informationEfficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms
Efficient Sequential Algorithms, Comp39 Part 3. String Algorithms University of Liverpool References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to Algorithms, Second Edition. MIT Press (21).
More informationAlgorithms Theory. 15 Text Search (2)
Algorithms Theory 15 Text Search (2) Construction of suffix trees Prof. Dr. S. Albers Suffix tree t = x a b x a $ 1 2 3 4 5 6 a x a b x a $ 1 $ a x b $ $ 4 3 $ b x a $ 6 5 2 2 : implicit suffix trees Definition:
More informationLexical Analysis. Implementation: Finite Automata
Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)
More informationChapter 12 Digital Search Structures
Chapter Digital Search Structures Digital Search Trees Binary Tries and Patricia Multiway Tries C-C Tsai P. Digital Search Tree A digital search tree is a binary tree in which each node contains one element.
More informationParallel Distributed Memory String Indexes
Parallel Distributed Memory String Indexes Efficient Construction and Querying Patrick Flick & Srinivas Aluru Computational Science and Engineering Georgia Institute of Technology 1 In this talk Overview
More informationsplitmem: graphical pan-genome analysis with suffix skips Shoshana Marcus May 7, 2014
splitmem: graphical pan-genome analysis with suffix skips Shoshana Marcus May 7, 2014 Outline 1 Overview 2 Data Structures 3 splitmem Algorithm 4 Pan-genome Analysis Objective Input! Output! A B C D Several
More informationProject Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio
Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade
More information