Intermediate Information Structures

Similar documents
Information Retrieval and Organisation

Suffix trees, suffix arrays, BWT

What are suffix trees?

COMBINATORIAL PATTERN MATCHING

COMP 423 lecture 11 Jan. 28, 2008

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Suffix Tries. Slides adapted from the course by Ben Langmead

CS481: Bioinformatics Algorithms

Algorithm Design (5) Text Search

Lecture 10: Suffix Trees

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Fig.25: the Role of LEX

CS201 Discussion 10 DRAWTREE + TRIES

Suffix trees. December Computational Genomics

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

CSE 549: Suffix Tries & Suffix Trees. All slides in this lecture not marked with * of Ben Langmead.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Definition of Regular Expression

Dr. D.M. Akbar Hussain

Presentation Martin Randers

Ma/CS 6b Class 1: Graph Recap

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

From Indexing Data Structures to de Bruijn Graphs

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline

Ma/CS 6b Class 1: Graph Recap

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

Lists in Lisp and Scheme

10.5 Graphing Quadratic Functions

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Slides for Data Mining by I. H. Witten and E. Frank

The Greedy Method. The Greedy Method

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search

Reducing a DFA to a Minimal DFA

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241 Week 4 Tutorial Solutions

Section 10.4 Hyperbolas

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Section 3.1: Sequences and Series

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

2-3 search trees red-black BSTs B-trees

Graphs with at most two trees in a forest building process

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

The dictionary model allows several consecutive symbols, called phrases

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Topic 2: Lexing and Flexing

CSCI 446: Artificial Intelligence

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

Ray surface intersections

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Network Interconnection: Bridging CS 571 Fall Kenneth L. Calvert All rights reserved

A dual of the rectangle-segmentation problem for binary matrices

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

Answer Key Lesson 6: Workshop: Angles and Lines

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

I/O Efficient Dynamic Data Structures for Longest Prefix Queries

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation

ON THE DEHN COMPLEX OF VIRTUAL LINKS

Lexical Analysis: Constructing a Scanner from Regular Expressions

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example:

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Typing with Weird Keyboards Notes

Greedy Algorithm. Algorithm Fall Semester

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

MATH 25 CLASS 5 NOTES, SEP

CSCE 531, Spring 2017, Midterm Exam Answer Key

CS 221: Artificial Intelligence Fall 2011

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

12-B FRACTIONS AND DECIMALS

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

2014 Haskell January Test Regular Expressions and Finite Automata

Lesson 4.4. Euler Circuits and Paths. Explore This

Orthogonal line segment intersection

Subtracting Fractions

Phylogeny and Molecular Evolution

From Dependencies to Evaluation Strategies

Lecture T1: Pattern Matching

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Transcription:

CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd

Preprocessing Strings We will look t Suffix Tries Suffix Trees Suffix Arrys Borrows-Wheeler trnsform Typicl setting: A long, known, nd fixed text string (like genome) nd mny unknown, chnging query strings. Allowed to preprocess the text string once in nticiption of the future unknown queries. Dt structures will e useful in other settings s well.

Preprocessing Strings For exmple, text T might e genomic sequences nd the queries might e short words over n lphet,c,g,t descriing trnscription fctor inding sites.

Suffix Tries A trie, pronounced try, is tree tht exploits some structure in the keys - e.g. if the keys re strings, inry serch tree would compre the entire strings, ut trie would look t their individul chrcters - Suffix trie re spce-efficient dt structure to store string tht llows mny kinds of queries to e nswered quickly. - Suffix trees re hugely importnt for serching lrge sequences like genomes. The sis for tool clled MUMMer (developed y UMD fculty).

Suffix Tries Let the string e y nd the trie S(y). All leves of S(y) re leled y the suffixes of y. Edges of S(y) re leled y letters of the lphet used for S(y) with n dded sentinel chrcter, sy, which is not prt of the lphet. Internl nodes re rnching nodes if they hve t lest two children. Edges outgoing from rnching nodes re leled y different letters.

Suffix Tries The role of suffixes nd the sentinel Consider the text c over the lphet,,c. It hs the following suffixes: c, c, c, nd. The sentinel : In the following, we wnt to ensure tht no suffix is prefix of ny other. To do so, we ppend specil chrcter not in the lphet to the end of the text. Now, consider the text c. It hs the following suffixes: c, c, c,, nd nd now is not prefix of c.

Suffix Tries Queries re prefixes of suffixes: To determine whether given query q is contined in the text, we could simply check whether q is the prefix of one of the suffixes.

Founding Editor-in- Chief of The IEEE/ ACM Trnsctions of Computtionl Biology nd Bioinformtics 8

Suffixes of chrcter string s = s is the string over the lphet (,) with termintion symol lso known s sentinel,. The suffixes of s re: i.e. the trivil suffix i.e. the sentinel suffix.

s = Suffix Tries SufTrie(s) = suffix trie representing string s. Edges of the suffix trie re leled with letters from the lphet (sy {,}). Every pth from the root to solid node represents suffix of s. Every suffix of s is represented y some pth from the root to solid node. Why re ll the solid nodes leves? How mny leves will there e?

Processing Strings Using Suffix Tries Given suffix trie T, nd string q, how cn we: determine whether q is sustring of T? check whether q is suffix of T? count how mny times q ppers in T? find the longest repet in T? find the longest common sustring of T nd q? Min ide: every sustring of s is prefix of some suffix of s.

s = Serching Suffix Tries Is sustring of s? Follow the pth given y the query string. After we ve uilt the suffix trees, queries cn e nswered in time: O( query ) regrdless of the text size.

Suffix Links in Suffix Trie To understnd suffix links first recll tht there re three kinds of nodes in suffix tree: The root -- Internl nodes -- Lef nodes In the grph elow, which is the suffix tree for ABABABC, the yellow circle is the root, the grey, lue nd green ones re internl nodes, nd the smll lck ones re leves.

Suffix Links in Suffix Trie There re two importnt things to notice: Internl nodes hve either one or more thn one outgoing edges. Tht is, internl nodes with more thn one outgoing edges mrk those prts of the tree where rnching occurs. Brnching occurs wherever repeted string is involved, nd only there. For ny internl node X, the string leding from the root to X must hve occurred in the input string t lest s mny times s there re outgoing edges from X. Exmple: The string leding to the lue node is ABAB. Indeed, this string ppers twice in the input string: At level 0 nd t level 2. And tht is why the lue node exists.

Suffix Links in Suffix Trie If the string s leding up to some internl node X is longer thn 1 chrcter, the sme string minus the first chrcter (cll this s-1) must e in the tree, too (it's suffix tree, fter ll, so the suffix of ny of its strings must e in the tree, too). Exmple: Let s=abab, the string leding to the lue node. Then fter removing the first chrcter, s-1 is BAB. And indeed tht string is found in the tree, too. It leds to green node (lelled This node ). This node

Suffix Links in Suffix Trie If some string s leds to n internl node, its shortened version s-1 must led to n internl node (cll it X-1) s well. Why? Becuse s must pper t lest twice in the input string, so s-1 must pper t lest s mny times (ecuse it is prt of s: wherever s ppers, s-1 must pper, too). But if s-1 ppers multiple times in the input string, then there must e n internl node for it. In ny such sitution, specil link connecting X to X-1 is suffix link. This node

Suffix Links in Suffix Trie Every internl node X with more thn 1 outlinks must hve suffix link to exctly one other internl node. This is the sme suffix tree s efore; the dotted lines indicte the suffix links. If you strt t the lue node nd follow the suffix links from there (from lue, to green, to first pink, to second pink), nd look t the strings leding from the root to ech node, you will see this: ABAB -> BAB -> AB -> B (lue) (green) (pink1) (pink2) This is why they re clled suffix links (the entire sequence is clled suffix chin).

s = Serching Suffix Tries Is sustring of s? Follow the pth given y the query string. After we ve uilt the suffix trees, queries cn e nswered in time: O( query ) regrdless of the text size.

Check whether q is sustring of T: Applictions of Suffix Tries (1) Check whether q is suffix of T: Count # of occurrences of q in T: Find the longest repet in T: Find the lexicogrphiclly (lpheticlly) first suffix:

Applictions of Suffix Tries (1) Check whether q is sustring of T: Follow the pth for q strting from the root. If you exhust the query string, then q is in T. Check whether q is suffix of T: Count # of occurrences of q in T: Find the longest repet in T: Find the lexicogrphiclly (lpheticlly) first suffix:

Applictions of Suffix Tries (1) Check whether q is sustring of T: Follow the pth for q strting from the root. If you exhust the query string, then q is in T. Check whether q is suffix of T: Follow the pth for q strting from the root. If you end t lef t the end of q, then q is suffix of T Count # of occurrences of q in T: Find the longest repet in T: Find the lexicogrphiclly (lpheticlly) first suffix:

Applictions of Suffix Tries (1) Check whether q is sustring of T: Follow the pth for q strting from the root. If you exhust the query string, then q is in T. Check whether q is suffix of T: Follow the pth for q strting from the root. If you end t lef t the end of q, then q is suffix of T Count # of occurrences of q in T: Follow the pth for q strting from the root. The numer of leves under the node you end up in is the numer of occurrences of q. Find the longest repet in T: Find the lexicogrphiclly (lpheticlly) first suffix:

Applictions of Suffix Tries (1) Check whether q is sustring of T: Follow the pth for q strting from the root. If you exhust the query string, then q is in T. Check whether q is suffix of T: Follow the pth for q strting from the root. If you end t lef t the end of q, then q is suffix of T Count # of occurrences of q in T: Follow the pth for q strting from the root. The numer of leves under the node you end up in is the numer of occurrences of q. Find the longest repet in T: Find the deepest node tht hs t lest 2 leves under it. Find the lexicogrphiclly (lpheticlly) first suffix:

Applictions of Suffix Tries (1) Check whether q is sustring of T: Follow the pth for q strting from the root. If you exhust the query string, then q is in T. Check whether q is suffix of T: Follow the pth for q strting from the root. If you end t lef t the end of q, then q is suffix of T Count # of occurrences of q in T: Follow the pth for q strting from the root. The numer of leves under the node you end up in is the numer of occurrences of q. Find the longest repet in T: Find the deepest node tht hs t lest 2 leves under it. Find the lexicogrphiclly (lpheticlly) first suffix: Strt t the root, nd follow the edge leled with the lexicogrphiclly (lpheticlly) smllest letter.

s = Suffix Links Suffix links connect node representing xα to node representing α. Most importnt suffix links re the ones connecting suffixes of the full string (shown t right). But every node hs suffix link. Why? How do we know node representing α exists for every node representing xα?

s = Suffix Tries A node represents the prefix of some suffix: s The node s suffix link should link to the prefix of the suffix s tht is 1 chrcter shorter. Since the suffix trie contins ll suffixes, it contins pth representing s, nd therefore contins node representing every prefix of s.

s = Suffix Tries A node represents the prefix of some suffix: s The node s suffix link should link to the prefix of the suffix s tht is 1 chrcter shorter. Since the suffix trie contins ll suffixes, it contins pth representing s, nd therefore contins node representing every prefix of s.

Applictions of Suffix Tries (II) Find the longest common sustring of T nd q: T = q =

Applictions of Suffix Tries (II) Find the longest common sustring of T nd q: Wlk down the tree following q. If you hit ded end, sve the current depth, nd follow the suffix link from the current node. When you exhust q, return the longest sustring found. T = q =

Constructing Suffix Tries

Suppose we wnt to uild suffix trie for string: s = c We will wlk down the string from left to right: c uilding suffix tries for s[0], s[0..1], s[0..2],..., s[0..n] To uild suffix trie for s[0..i], we will use the suffix trie for s[0..i-1] uilt in previous step To convert SufTrie(S[0..i-1]) SufTrie(s[0..i]), dd chrcter s[i] to ll the suffixes: c i=4 Need to dd nodes for the suffixes: c c c c c Purple re suffixes tht will exist in SufTrie(s[0..i-1]) Why? How cn we find these suffixes quickly?

Suppose we wnt to uild suffix trie for string: s = c We will wlk down the string from left to right: c uilding suffix tries for s[0], s[0..1], s[0..2],..., s[0..n] To uild suffix trie for s[0..i], we will use the suffix trie for s[0..i-1] uilt in previous step To convert SufTrie(S[0..i-1]) SufTrie(s[0..i]), dd chrcter s[i] to ll the suffixes: c i=4 Need to dd nodes for the suffixes: c c c c c Purple re suffixes tht will exist in SufTrie(s[0..i-1]) Why? How cn we find these suffixes quickly?

c i=4 Need to dd nodes for the suffixes: c c c c c Purple re suffixes tht will exist in SufTrie(s[0..i-1]) Why? How cn we find these suffixes quickly? c c c c Where is the new deepest node? (k longest suffix) c SufT rie() SufT rie(c) How do we dd the suffix links for the new nodes?

c i=4 Need to dd nodes for the suffixes: c c c c c Purple re suffixes tht will exist in SufTrie(s[0..i-1]) Why? How cn we find these suffixes quickly? c c c c Where is the new deepest node? (k longest suffix) c SufT rie() SufT rie(c) How do we dd the suffix links for the new nodes?

To uild SufTrie(s[0..i]) from SufTrie(s[0..i-1]): CurrentSuffix = longest (k deepest suffix) until you rech the root or the current node lredy hs n edge leled s[i] leving it. Repet: Add child leled s[i] to CurrentSuffix. Follow suffix link to set CurrentSuffix to next shortest suffix. Becuse if you lredy hve node for suffix αs[i] then you hve node for every smller suffix. Add suffix links connecting nodes you just dded in the order in which you dded them. In prctice, you dd these links s you go long, rther thn t the end.

Python Code to Build Suffix Trie def uild_suffix_trie(s): """Construct suffix trie.""" ssert len(s) > 0 clss SuffixNode: def init (self, suffix_link = None): self.children = {} if suffix_link is not self.suffix_link = else: self.suffix_link = None: suffix_link # explicitly uild the two-node suffix tree Root = SuffixNode() # the root node Longest = SuffixNode(suffix_link = Root) Root.dd_link(s[0], Longest) s[0] self # for every chrcter left in the string def dd_link(self, c, v): """link this node to node self.children[c] = v v vi string c""" for c in s[1:]: Current = Longest; Previous = None while c not in Current.children: # crete new node r1 with trnsition Current -c->r1 r1 = SuffixNode() Current.dd_link(c, r1) # if we cme from some previous node, mke # node's suffix link point here if Previous is not None: Previous.suffix_link = r1 tht # wlk down the suffix links Previous = r1 Current = Current.suffix_link # mke the lst suffix link if Current is Root: Previous.suffix_link = Root else: Previous.suffix_link = Current.children[c] # move to the newly dded child of the longest # (which is the new longest pth) Longest = Longest.children[c] return Root pth

current current s[i] s[i] longest s[i] s[i] s[i] s[i] u longest s[i] s[i] u s[i] Prev Prev current s[i] oundry pth s[i] s[i] s[i] s[i] Prev longest

Note: there's lredy pth for suffix "", so we don't chnge it (we just dd suffix link to it)

Note: there's lredy pth for suffix "", so we don't chnge it (we just dd suffix link to it)

Note: there's lredy pth for suffix "", so we don't chnge it (we just dd suffix link to it)

How mny nodes cn suffix trie hve? s = s = n n will hve 1 root node n nodes in pth of s n pths of n+1 nodes Totl = n(n+1)+n+1 = O(n 2 ) nodes. This is not very efficient. How could you mke it smller?

So... we hve to trie gin... Spce-Efficient Suffix Trees

A More Compct Representtion s = 1234567 s = 1234567 6:6 5:6 7:7 5:6 7:7 4:7 7:7 4:7 7:7 4:7 Compress pths where there re no choices. Represent sequence long the pth using rnge [i,j] tht refers to the input string s.

Spce usge: In the compressed representtion: - - - # leves = O(n) [one lef for ech position in the string] Every internl node is t lest inry split. Ech edge uses O(1) spce. Therefore, # numer of internl nodes is out equl to the numer of leves. And # of edges numer of leves, nd spce per edge is O(1). Hence, liner spce.

Trivil lgorithm to uild Suffix tree Put the lrgest suffix in Put the suffix in

Put the suffix in

Put the suffix in

Put the suffix in

We will lso lel ech lef with the strting point of the corres. suffix. 1 4 3 2 5

Anlysis Tkes O(n 2 ) time to uild. We will see how to do it in O(n) time

Constructing Suffix Trees - Ukkonen s Algorithm The sme ide s with the suffix trie lgorithm. Min difference: not every trie node is explicitly s = u represented in the tree. Solution: represent trie nodes s pirs (u, α), where u is rel node in the tree nd α is some string leving it. v suffix_link[v] = (u, ) Some dditionl tricks to get to O(n) time.

Storing more thn one string with Generlized Suffix Trees

Constructing Generlized Suffix Tre Gol. Represent set of strings P = {s 1, s 2, s 3,..., s m }. Exmple. tt, tg, gt Simple solution: (1) uild suffix tree for string t# 1 tg# 2 gt# 3

Gol. Represent set of strings P = {s 1, s 2, s 3,..., s m }. Exmple. tt, tg, gt Simple solution: Constructing Generlized Suffix Tre (1) uild suffix tree for string t# 1 tg# 2 gt# 3 (2) For every lef node, remove ny text fter the first # symol. #3 g #1tg#2gt#3 #2gt#3 #3 g #2 #1 t t #3 g#2gt#3 #1tg#2gt#3 #3 g#2gt#3 t t#3 t#1tg#2gt#3 #2gt#3 #3 g#2 # 1 # 3 g# 2 t t# 1 t#3 #2 #1tg#2gt#3 #3 #1 #3

Applictions of Generlized Suffix Trees Longest common sustring of S nd T: Determine the strings in dtse {S 1, S 2, S 3,..., S m } tht contin query string q:

Applictions of Generlized Suffix Trees Longest common sustring of S nd T: Build generlized suffix tree for {S,T} Find the deepest node tht hs hs descendnts from oth strings (contining oth # 1 nd # 2 ) Determine the strings in dtse {S 1, S 2, S 3,..., S m } tht contin query string q: Build generlized suffix tree for {S 1, S 2, S 3,..., S m } Follow the pth for q in the suffix tree. Suppose you end t node u: trverse the tree elow u, nd output i if you find string contining # i.

Longest Common Extension Longest common extension:we re given strings S nd T. In the future, mny pirs (i,j) will e provided s queries, nd we wnt to quickly find: the longest sustring of S strting t i tht mtches sustring of T strting t j. S LCE(i,j) T LCE(i,j) i j Build generlized suffix tree for S nd T. Preprocess tree so tht lowest common ncestors (LCA) cn e found in constnt time. LCA(i,j) Crete n rry mpping suffix numers to lef nodes. Given query (i,j): Find the lef nodes for i nd j Return string of LCA for i nd j j i i j

Longest Common Extension Longest common extension:we re given strings S nd T. In the future, mny pirs (i,j) will e provided s queries, nd we wnt to quickly find: the longest sustring of S strting t i tht mtches sustring of T strting t j. S LCE(i,j) T LCE(i,j) i j Build generlized suffix tree for S nd T. Preprocess tree so tht lowest common O( S + T ) O( S + T ) ncestors (LCA) cn e found in constnt time. Crete n rry mpping suffix numers to lef LCA(i,j) nodes. O( S + T ) Given query (i,j): Find the lef nodes for i nd j Return string of LCA for i nd j O(1) O(1) j i i j

Using LCE to Find Plindromes Mximl even plindrome t position i: the longest string to the left nd right so tht the left hlf is equl to the reverse of the right hlf. S x y x y = the reverse of i plindromes in S. Gol: find ll mximl Sr y x x y n - i Construct S r, the reverse of S. Preprocess S nd S r so tht LCE queries cn e solved in constnt time (previous slide). LCE(i, n-i) is the length of the longest plindrome centered t i. n-i) For every position i: Compute LCE(i,

Using LCE to Find Plindromes Mximl even plindrome t position i: the longest string to the left nd right so tht the left hlf is equl to the reverse of the right hlf. S x y x y = the reverse of i plindromes in S. Gol: find ll mximl Sr y x x y Construct S r, the reverse of S. O( S ) n - i Preprocess S nd S r so tht LCE queries cn e solved in constnt time (previous slide). O( S ) LCE(i, n-i) is the length of the longest plindrome centered t i. For every position i: Compute LCE(i, n-i) O( S ) O(1) Totl time = O( S )

Recp Suffix tries nturl wy to store string -- serch, count occurrences, nd mny other queries nswerle esily. But they re not spce efficient: O(n 2 ) spce. Suffix trees re spce optiml: O(n), ut require little more sutle lgorithm to construct. Suffix trees cn e constructed in O(n) time using Ukkonen s lgorithm. Similr ides cn e used to store sets of strings.