Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Size: px
Start display at page:

Download "Position Heaps: A Simple and Dynamic Text Indexing Data Structure"

Transcription

1 Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder, CO , USA. Dept. of Computer Science, Colordo Stte University, Fort Collins, CO USA Astrct We ddress the prolem of finding the loctions of ll instnces of string P in text T, where preprocessing of T is llowed in order to fcilitte the queries. Previous dt structures for this prolem include the suffix tree, the suffix rry, nd the compct DAWG. We modify dt structure clled sequence tree, which ws proposed y Coffmn nd Eve for hshing [], nd dpt it to the new prolem. We cn then produce list of k occurrences of ny string P in T in O( P +k) time. Becuse of properties shred y suffixes of text tht re not shred y ritrry hsh keys, we cn uild the structure in O( T ) time, which is much fster thn Coffmn nd Eve s lgorithm. These ounds re s good s those for the suffix tree, suffix rry, nd the compct DAWG. The dvntges re the elementry nture of some of the lgorithms for constructing nd using the dt structure nd the symptotic ounds we cn give for updting the dt structure when the text is edited. Keywords: position hep string serching. Introduction In this pper, we consider the prolem of finding occurrences of pttern string P in text T, where preprocessing of T is llowed in order to crete dt structure tht speeds up the serch. In this pper, we let m denote the length P of P, n denote the length T of T, nd k denote the numer of positions in T where P occurs s sustring. We ssume tht the size of the lphet Σ is fixed. We descrie two dt structures, the position hep nd the ugmented position hep. We gve primitive version of the position hep in [], nd some of the results of this pper were sketched in [], where we descried structure tht is closely relted to the ugmented position hep, contrcted Emil ddresses: ndrzej@cs.colordo.edu (Andrzej Ehrenfeucht), rmm@cs.colostte.edu (Ross M. McConnell), osheim@cs.colostte.edu (Niss Osheim), woo@cs.colostte.edu (Sung-Whn Woo) Preprint sumitted to Discrete Algorithms Jnury 9, 0

2 suffix tree. The position hep nd the ugmented position hep hve helpful comintoril properties tht the contrcted suffix tree does not. The position hep of T is unique, while contrcted suffix tree is not. Definition.. Let h(t) e the length of the longest sustring X of T tht is repeted t lest X times in T. A few moment s reflection revels tht h(t) cn e expected to e quite smll for most prcticl pplictions. The expected vlue of h(t) is O(log n) when T is rndomly-generted string. However, since few pplictions del with rndom strings, more importnt oservtion is tht long repeted sustrings in T hve little impct on the vlue of h(t) unless they re repeted n inordinte numer of times. We discuss properties of h(t) in greter detil elow. In this pper, we give the following results:. We descrie the ugmented position hep for the first time. The dt structure is trie with n nodes nd height O(h(T)). It is ugmented with some dditionl pointers to fcilitte queries.. As strting point for lgorithms on the ugmented position hep we review extremely simple lgorithms for constructing the position hep in O(nh(T)) time, nd for querying it in O(min(m,mh(T))) time []. Though these worst-cse ounds re inferior to ounds we give elow, the O(min(m,mh(T))) ound for the query lgorithm is overly pessimistic in prctice. For instnce, when T is rndomly constructed string nd the construction of P does not depend on T, or if P is rndomly constructed string nd construction of T does not depend on P, then this simple query lgorithm tkes O(m + k) expected time. Becuse of the simplicity of these lgorithms nd the expecttion tht h(t) is usully smll in prctice, they re proly of prcticl interest in some contexts, nd they re of pedgogicl interest, s they cn e tught nd implemented in undergrdute dt structures courses. In [], we lso gve more sophisticted O(n) lgorithm for constructing the position hep tht we generlize to the ugmented position hep here.. We show how to get n O(n) ound for constructing the ugmented position hep nd simple O(m + k) ound for finding the k occurrences of P in T. For the cse where the user my wish to hve the option to ndon the query unexpectedly fter k < k occurrences hve een returned, we show how to construct n itertor in O(m) time tht then gives occurrences of P in T in left-to-right order of occurrence in O(log k ) time piece. 4. We show how to dpt the position hep nd the dynmic position hep for dynmiclly chnging texts. When consecutive lock of chrcters is deleted from T, we show how to updte the ugumented position hep in O((h(T) + )h(t) log n) mortized time. When consecutive lock of chrcters from is inserted to T, yielding text T, we show how to updte the ugmented position hep in O((h(T )+)h(t )log n) mortized

3 time. The opertions re sed on the sift-up nd sift-down opertions on stndrd heps. The reson for the log n fctor nd the mortiztion of the time ound is due to the dt structure we use for mintining the dynmic text, not for updting the ugmented position hep. The trdeoff of implementing the position hep to ccommodte string edits is tht serches tke O(mlog n + k) mortized time, rther thn O(m + k) time. Previous dt structures for this prolem include the suffix tree [4], the compct directed cyclic word grph (compct DAWG) [5], nd the suffix rry [6]. The first two pproches tke O(n) time to uild the dt structure, nd O(m + k) time to find the k positions where the pttern string occurs. The suffix rry cn e constructed in O(n) time, nd tkes O(m+log n) time to produce pointer to list of occurrences of P in T. A slightly slower pproch tkes O(mlog n) time, nd this pproch is of prcticl interest ecuse of its simplicity. When the text hs low entropy, the FM-index scheme llows serching on version of the text tht is compressed with the Burrows-Wheeler trnsform [7] with no significnt slowdown in the query time [8]. The iggest disdvntge of our structure when compred with suffix rrys is its lrger spce requirement. Like the suffix tree nd the compct DAWG, nd unlike the suffix rry, our ounds must e incresed y log Σ fctor when the size of the lphet, Σ, is introduced s vrile. This fctor comes from the time required to find the child of node on the child edge leled y given letter of Σ. This cn e improved to O() expected time with hsh tle tht returns the child, given hsh key consisting of the prent nd letter. This is nevertheless lso disdvntge when compred to suffix rrys. There hs een previous work on updting indexing structures when the text is edited. The generlized suffix tree llows serch for pttern string in collection of texts. In [9], it is shown tht it is possile to implement it to llow insertion nd removl of ny text X in the collection in O( X ) time, nd in [8], it is shown how to do this on collection of Burrows-Wheeler compressed texts in ner-liner time. However, X must e inserted or removed in its entirety nd ritrry edits on X re not supported. The results most comprle to ours hve een given recently y Slson et. l. [0, ]. They hve given n pproch tht tkes O(n) worst-cse time to modify the Burrows-Wheeler trnsform nd the suffix rry fter n ritrry edit opertion on T [0, ]. Though, in the worst cse, this is s d s the cost of discrding the suffix rry nd reuilding it from the eginning, they rgue tht their pproch is much more efficient in prctice, nd support this with empiricl studies on enchmrks. Note tht h(t) is lso Θ(n) in the worst cse, such s when T = n. However, our ounds re stronger thn O(n) ecuse we cn lwys reuild the dt structure from scrtch in O(n) time if O((h(T)+)h(T)log n) exceeds this cost, nd it chrcterizes nlyticlly the reltionship etween the running time nd n esily-understood property of the text. Slson et. l. identify n s text tht requires Θ(n) time for their lgo-

4 rithm lso. However, the performnce of their lgorithms cn suffer gretly from single repetition of lrge string in T. An illustrtion of this phenomenon is where W is string on Σ, $ is specil chrcter they ppend to T tht is less thn ny letter in Σ in lexicl order, nd T is the conctention WW$. Suppose # is chrcter tht is lrger thn ny chrcter in Σ in lexicl order. Let T = WW#$. Then Θ(n) chnges to the suffix rry of T re required to otin the suffix rry of T. By contrst, h(t) = O(h(W)) in this cse, it tkes O((h(W) + )h(w)log n) mortized ound to updte our dt structure fter this sme edit opertion. This is seen y the following lemm, which gives n overly pessimistic upper ound on h(t) in terms of h(w). Lemm.. If T = WW, then h(t) h(w). Proof: Let X e string of length h(t) tht occurs h(t) times in T. For ech of these occurrences, either the first h(t)/ chrcters of the occurrence lie in the first occurrence of W, or the lst h(t)/ chrcters lie in the second occurrence of W. By the pigeonhole principle, one of these is string of length h(t)/ tht occurs h(t)/ times in W, giving lower ound of h(t)/ on h(w). Implementtions of lgorithms nd dt structures given in this pper cn e found t Preliminries Let λ e the null string. If X = x x...x j is string, we let X denote the length j of X. The reverse of X is the string X R = x j x j...x. For resons tht will ecome cler shortly, we dopt the convention of numering the positions of the text T from right to left, so T = t n t n...t. Let T i denote the suffix t i t i...t eginning t position i. Let us distinguish sustring P = p p...p m of T from n instnce i of P in T, where P = t i t i...t i m+. The null sustring, λ, is considered to occur t every position. If X nd Y re strings, we denote their conctention y XY. If X is prefix of P, we let P X denote the suffix of P consisting of the lst P X letters of P. If X is suffix of P, we let P \ X denote the prefix of P consisting of the first P X letters of P. Definition.. A rooted tree hs the hep property if ech node crries lel from n ordered set, such s the integers, nd, for every internl node X, the lels of the children of X re greter thn the lel of X. Definition.. A trie on lphet Σ denotes rooted tree T with the following properties:. Ech edge is leled with chrcter;. For ech node u nd letter Σ, there is t most one edge with lel from u to child of u. 4

5 Figure : The sequence hsh tree of sequence of strings. We refer to ech node y the string of letter lels on the pth from the root to the node. For exmple, the node leled 6 cn e thought of s synonymous with the string. Ech string in the sequence is instlled t new node tht is the shortest prefix of the string tht isn t lredy node of the sequence hsh tree. These prefixes re underlined. For exmple, when string 9 is inserted, its prefix is lredy node of the tree, ut its prefix is not, so pointer to string 9 is inserted t new node,. Given trie, let us sy tht the lel of pth from the root to node u is the string given y the sequence X of chrcters tht occur on edges of the pth. This is the pth lel of u. Becuse of the second property, the pth lel uniquely identifies u. We therefore dopt the convention of treting the node nd its pth lel s interchngele ojects. For exmple, we my consider whether string X is node of the trie, or whether one node is sustring of nother. Note tht one node is prefix of nother if nd only if it is n ncestor in the trie. A sic opertion on trie tkes n input string P = p p...p m nd finds the lrgest prefix P of P tht is node of the trie. Since Σ is fixed, this is esily ccomplished in O( P ) time y strting t the root nd itertively tking edges leled with the sequence of letters from P, until P is exhusted or node is encountered tht doesn t hve child on the next letter of P. Let us cll this opertion indexing into the trie.. Sequence Hsh Trees A dt structure of Coffmn nd Eve [], clled sequence hsh tree, ws designed for the prolem of implementing hsh tles (dictionries) whose keys re strings. It consists of trie for indexing into the tle. The structure of the tree depends on the order in which the strings re inserted. We descrie minor vrint tht is esier to dpt to our sustring mtching prolem, elow. Let S = (S,S,...S n ) e given ordering of the strings. Without loss of generlity for our purposes, we my ssume tht no string in S is prefix of ny other. The trie H n tht they construct is defined y induction, s follows. If i =, the trie H is just root node with pointer to S. If i >, then H i is otined from H i y finding the shortest prefix X of S i tht is not lredy node of the trie. A new node X is dded s the child of node X on edge leled, nd pointer is instlled from it to S i. 5

6 Figure : Incrementl construction of the position hep. Suffixes T, T,...,T n re inserted in scending order of length. The figure depicts the insertion of T i when i =. Indexing into the hep on T i identifies the longest prefix () of T i tht is lredy node Y of the hep. The shortest prefix of T i tht is not lredy node of the hep () is inserted s child of Y nd leled with position i. Figure gives n exmple. Coffmn nd Eve s pper hs received little ttention since it ws pulished in 970, due, in no dout, to the existence of superior wys of implementing hsh tle. In the present pper, we show tht this dt structure is much richer when considered in the context of the new prolem. The structure of the set of suffixes of text T llows us to derive interesting nd lgorithmiclly useful properties tht do not pply in the generl cse ddressed y Coffmn nd Eve. In prticulr, we show tht it hs height t most h(t), nd show tht if the suffixes re inserted in scending order of length, it is now possile to uild the dt structure in time tht is liner in n = T, tht is, in O() time, mortized, per hsh key. We show how the tree cn e ugmented with mximl-rech pointers so tht finding ll k entries tht hve P s prefix tkes O(m + k) worst-cse time, independently of the height of the tree. 4. The Position Hep Up until the lst two sections of this pper, we ssume tht T is sttic. We cn therefore suppose tht T nd P re stored in chrcter rrys, which supports lookup of the chrcter in given position i in O() time. Definition 4.. The position hep H(T) of text T is otined y itertively inserting the suffixes (T,T,...,T n ) of T, in scending order of length, into Coffmn nd Eve s dt structure using their insertion opertion. Tht is, T i is inserted y creting new node tht is the shortest prefix of T i tht is not lredy node of the tree, nd leling it with position i. Let us cll the lgorithm implied y this constructive definition the nive construction lgorithm. Figure gives n illustrtion. Coffmn nd Eve 6

7 ssume tht ech inserted string ends with specil chrcter $, ecuse one must ensure tht no inserted string is lredy node of the tree when it is inserted. The use of specil chrcter to ccomplish this is unnecessry in our context, since ech string T i is longer thn ny string inserted efore it, nd ech node previously inserted is prefix of some T j for j < i. The construction cn e executed for ny text T, nd, since it is deterministic, the position hep H(T) for text is unique. The lgorithm is simple enough to e tught nd progrmmed in undergrdute dt structures clsses. 4.. A time ound for constructing the position hep We now give time ound for using the ove constructive definition of the position hep s n lgorithm. We improve the time ound to O(n) elow, t the expense of dding elements to the dt structure. Lemm 4.. The height of the position hep of text T is t most h(t). Proof: Let X = x j x j...x e deepest lef of the tree. Let X i denote the prefix x i x i...x i of X. For ech i from through j, X i occurs t lest i times in T ecuse it hs t lest i descendnts, {X i,x i,...x }, nd ech of these contins n occurrence of sustring of which X i is prefix. Therefore, X j/ hs length j/ nd occurs t lest j/ times in T. It must e tht j/ is lower ound on h(t), so the height j is ounded y h(t) +. Corollry 4.. The nive construction lgorithm tkes O(nh(T)) time. Proof: Indexing into the hep to find the prent of the new node to e inserted for position i tkes time tht is ounded y the height of the hep, hence O(h(T)) time. Adding the new child tkes O() time. Summing this over ll positions gives n O(nh(T)) ound. 5. The Nive Query Algorithm We now give time ound for querying the position hep. We improve the time ound elow, t the expense of dding elements to the dt structure. Definition 5.. The nive query lgorithm for finding ll occurrences of pttern string P in T consists of the following steps. Index into the position hep to find the longest prefix X of P tht is node of H(T). For ech ncestor X of X (including X), look up the position i stored in X. Position i is n occurrence of X. Determine whether this occurrence is followed y P X. If it is, report i s n occurrence of P. If X = P, lso report ll positions stored t descendnts of X. Figure gives n exmple. This lgorithms is lso simple enough to e tught nd progrmmed in undergrdute dt structures clsses. 7

8 Figure : The nive query lgorithm. To find the occurrences of, index in on to the node leled 6. All positions t descendnts of this node ({6,, 9}) re occurrences of. In ddition, some ncestors cn e occurrences of. This is determined y inspection t the positions in ncestors, whereupon it is determined tht is lso n occurrence. A string such s tht is not node of the position hep is hndled slightly differently. Index in on the longest prefix tht is node of the string, in this cse. Only the positions {,, 6, 9} in ncestors of this node cn e occurrences. Which ones re occurrences is determined y inspection t these positions, whereupon it is determined tht 9 is the only occurrence. Lemm 5.. The nive query lgorithm is correct. Proof: A node X contins position i where X occurs in T. If X is prefix of P, then it is n ncestor of X, nd i my or my not e n occurrence of P in T, depending on whether the occurrence of X t i is followed y P X. The test for this condition returns i if nd only if it is n occurrence of P. If P is prefix of X, then X = P, nd since ll prefixes of X occur t position i, so does P. This is reported during the trversl of the sutree rooted t P. If the longest common prefix Y of P nd X is neither P nor X, then the occurrence of Y t i is followed y the first letter of X Y, which is not the first letter of P Y. Therefore, i is not n occurrence of P. The query does not report i in this cse. Lemm 5.. The nive query lgorithm runs in O(min(m,mh(T))+k) time. Proof: If X is the longest prefix of P tht is node of the hep, it tkes O( X ) time to find X y indexing into the hep on P. For ech of the X + ncestors of X, we must look up the position i stored in the ncestor, nd determine in O(m) time whether P occurs t position i. Since X m, this gives n O(m ) ound for this step. Since X is O(h(T)), this lso gives n O(mh(T)) ound for this prt. 8

9 If X = P, tht is, if P is node of the position hep, it lso tkes O() time to return ech of the positions in the sutree rooted t X, for totl of O(m + k) nd O(mh(T) + k). Lemm 5.4. If T is rndomly constructed string nd the construction of P does not depend on T, or if P is rndomly constructed string nd construction of T does not depend on P, then the nive query lgorithm tkes O(m + k) expected time, where m nd k re s in Lemm 5.. Proof: The mh(t) term comes from the fct tht t ech of O(h(T)) nodes X, we must check whether the occurrence of X t the position i tht it stores is followed y P X. This requires checking whether P X letters of P mtch t P X positions of T. The check hlts when mismtch is detected. The proility of ny of positions mtching is / Σ, so the expected numer of checks efore hlting is (Σ )/Σ P X i= / Σ i = O(). 6. The Augmented Position Hep The only ostcle to n O(m + k) worst-cse ound for returning the k occurrences of P is the time to check whether P occurs t the positions stored t ncestors of the lrgest prefix X of P tht is node of the position hep. Definition 6.. Let i e the position stored t node X in H(T), nd let Y e the lrgest prefix of T i tht is node of H(T). The mximl-rech pointer for X is pointer from node X to node Y. The ugmented position hep for T is otined y leling ech node X of H(T) with its mximl-rech pointer nd X s discovery nd finishing time in depth-first trversl of H(T) []. We lso ssocite with the hep n rry N[] such tht N[i] contins pointer to the node of the hep tht contins position i. Let H (T) denote the ugmented position hep. Figure 4 gives n exmple. The N[] rry nd the discovery nd finishing times re omitted. The mximl-rech pointers re depicted with dshed rrows. For exmple, the mximl-rech pointer from the node leled 4 points to the node leled 7, since is the longest sustring eginning t position 4 tht is node of the position hep. A nive lgorithm for otining the ugmented position hep is otined s follows. Crete the position hep for T. The pointers cn e instlled in N[] nd the discovery nd finishing times cn e ssigned to nodes of the hep during depth-first trversl of T. Then for ech suffix T i of T, index s fr s possile into H(T) on T i to find the the mximl node Y tht is prefix of T i. Instll mximl-rech pointer to Y from the node X pointed to y N[i]. 9

10 6.. Queries in O(m + k) time It is well-known tht node x of rooted tree is n ncestor of node y if nd only if the discovery time of x is less thn the discovery time of y nd the finishing time of x is lrger thn the finishing time of y []. This gives the following: Lemm 6.. Given pointers to two nodes X nd Y of H (T), it tkes O() time to determine whether X is n ncestor of Y. Lemm 6.. Given pointer to node X of H (T) nd position i, it tkes O() time to determine whether i is n occurrence of string X in T. Proof: It tkes O() time to find the node Y tht contins i using N[]. By Lemm 6., it tkes O() time to determine whether Y is descendnt of X. If so, then since i is n occurrence of Y nd X is prefix of Y, i is n occurrence of X. If not, it tkes O() time to determine whether Y is n ncestor of X, y Lemm 6.. If it is, then let Z e the node pointed to y the mximl-rech pointer of Y. Position i is n occurrence of X if nd only if X is prefix of T i. Z is the mximl prefix of T i tht is node of the hep. Therefore, X occurs t position i if nd only if it is (not-necessrily proper) prefix of Z, tht is, if nd only if Z is descendnt of X. This tkes O() time to determine, y Lemm 6.. For exmple, in Figure 4, given pointer to the node (the one leled 6), we cn tell tht is descendnt y looking in N[] to find pointer to its node, nd using the discovery nd finishing times of nd to determine tht is descendnt. Therefore, it is n occurrence. We cn tell tht is n occurrence y looking in N[] to find its node, using the discovery/finishing times to find tht it is n ncestor of node, using its mximl-rech pointer to find the node, nd using the discovery/finishing times of nd to determine tht is descendnt of. We cn tell tht is not n occurrence, ecuse its mximl-rech pointer doesn t point to descendnt of. Corollry 6.4. Let Xc e string such tht X is node of the tree nd Xc is not. Given pointer to X nd position j, it tkes O() time to determine whether j is n occurrence of Xc. Proof: By Lemm 6., it tkes O() time to determine whether j is n occurrence of X. If it is, then it is n occurrence of Xc if c occurs t position j X, which tkes O() time to check when T is stored in n rry. Before giving pseudocode for the liner-time query lgorithm, we illustrte the min ides in Figure 4. There re two cses: Cse, where the serch string is node of the position hep, nd Cse, where it is not. Cse is illustrted y, which is the node leled 6. By Lemm 6., we cn now check in O() time piece which of the positions {,} t proper ncestors re occurrences of. Only is; its node is the only proper ncestor with mximl-rech pointer into s sutree. Tht is O(m) time so fr. In 0

11 ddition, we report the lels of descendnts {6,,9} in O() time piece, s efore, in O(k) time, for totl of O(m + k) time. Cse is illustrted y, which is not node of the hep. Our strtegy is to prtition the string into segments,, nd, which cn e hndled efficiently y Corollry 6.4 nd Lemm 6.. We use the corollry to find the occurrences of, discrd those tht re not followed y. This gives the occurrences of. We then use the lemm to discrd from these occurrences those tht re not followed y. To pply the corollry, we wnt ll the segments except the lst to e of the form Xc, where X is node of the tree nd Xc is not. The first such segment is. This is our current sustring. As in the nive query lgorithm, only ncestors of X = cn e positions of. These re leled {,,6,9}. By Corollry 6.4, we cn determine which re occurrences of the current sustring in O() time piece, for totl of O( Xc ) time. This leves positions {6,9}. The string ecomes the finished prefix, its positions {6,9} re known, nd the rest of the query string, is the remining suffix. We now look for the prefix of the remining suffix of the form Xc, where X is node of the hep nd Xc is not. This is. We wnt to find which occurrences of Xc follow occurrences of the finished prefix. To do this, we sutrct the length of the finished prefix from ech of the positions of the finished prefix nd determine in O() time whether it is n occurrence of Xc, y Corollry 6.4. In the exmple, sutrcting = 4 from 9 gives 5, nd we determine tht 5 is n occurrence of. Therefore, 9 is n occurrence of. Sutrcting 4 from 6 gives, nd we determine tht is not n occurrence of. Therefore, of the initil possile positions of the serch string, {6,9}, only 9 survives the test. The finished prefix is now, the positions where it occurs re known to e {9}, nd the remining suffix is. When the remining suffix is short enough to e node of the tree, let us denote it Y. (In the exmple, Y =.) We sutrct the length of the finished prefix from ech of its occurrences ({9} for this exmple), nd check whether ech of these positions ({} in this exmple) is n occurrence of Y, using Lemm 6.. Since position is n occurrence of Y =, position 9 is n occurrence of the originl serch string. Generlizing from these exmples, we get the lgorithm of Tle. Lemm 6.5. The liner query lgorithm is correct. Proof: For Cse, the procedure is the sme s the nive lgorithm, except tht t ech ncestor X of P, we determine whether i is n occurrence of P in O() time, insted of O( P ) time, using Lemm 6.. For Cse, y induction on the numer of times FinishedPrefix is ssigned, I is the set of positions wherefinishedprefix occurs in T. In the finl line, P = F inishedp ref ix + ReminingSuf f ix, nd I is ssigned to e those positions of FinishedPrefix. After the finl step, FinishedPrefix = P, hence I is the set of positions in T where P occurs.

12 Tle : The liner query lgorithm for use with the ugmented position hep Cse : P is node of H (T). This is detected y indexing into H (T) on P, nd gives node P. For ech proper ncestor X of P, look up the position i stored t X, nd determine whether it is n occurrence of P. In ddition, report ll positions recorded in the sutree rooted t P. Cse : P is not node of H (T). // Find n initil set of cndidte positions Let CurrentSustring e the shortest prefix of P tht is not node of H (T) Let I e the set of positions where CurrentSustring occurs // Invrints: FinishedPrefix + ReminingSuffix = P; // I is the set of positions where FinishedPrefix occurs in T FinishedPrefix = CurrentSustring ReminingSuffix = P - CurrentSustring while ReminingSuffix is not node of H (T) Let CurrentSustring e the shortest prefix of ReminingSuffix tht is not node of H (T) I := {j j I nd the occurrence of FinishedPrefix t j in T is followed y n occurrence of CurrentSustring} ReminingSuffix = ReminingSuffix - CurrentSustring FinishedPrefix = FinishedPrefix + CurrentSustring CurrentSustring = ReminingSuffix Let I := {j j I nd the occurrence of CurrentPrefix t i is followed y CurrentSustring}

13 , 6,, 9 ncestors descendnts Figure 4: The liner query lgorithm on strings nd on the ugmented position hep. Mximl-rech pointers re dshed, nd mximl-rech pointers tht re loops re omitted from the digrm. Lemm 6.6. The liner query lgorithm cn e implemented in O(m + k) time using the ugmented position hep. Proof: Cse differs from the nive pproch only in tht it uses Lemm 6. to determine which ncestors of P contin the position of n occurrence of P, reducing ech of these tests from O( P ) to O(). Since there re P + ncestors of P, this tkes O( P ) time. As in the nive query lgorithm, ll other occurrences of P re found in O() time piece during trversl of the sutree rooted t P, for totl of O( P + k) time. For Cse, let (P P,...,P l ) e the vlues tken on y CurrentSustring, nd let (I,I,...,I l ) e the vlues tken on y I. To find the i th vlue P i of CurrentSustring, index s fr s possile on ReminingSuffix into H (T), yielding node X i, nd let e the next chrcter of ReminingSuffix following prefix X. P i = X i. Over ll itertions, this tkes time proportionl to l i= P i = O( P ). For i l, nd ech j I i, it tkes O() time to determine whether the instnce of FinishedPrefix t position j is followed y X i ; this is determined y finding whether j FinishedPrefix is n occurrence of X i, using Lemm 6.. It then tkes O() time to determine whether this occurrence of X i is followed y n occurrence of t position j FinishedPrefix X i. This determines whether the occurrence of FinishedPrefix t position j is followed y X i = P i. Therefore, it tkes O() time to determine, for ech element of I i, whether it remins in i.

14 By the nive lgorithm, ech P i hs O( P i ) occurrences, ecuse P i is not node of the tree, hence its occurrences cn only e recorded in ncestors (prefixes) of X i. Therefore, I i = O( P i ). Determining I l therefore tkes O( l i= I i ) = O( l i= P i ) = O( P ) time. 6.. Returning positions one-y-one in left-to-right order It is sometimes climed tht the suffix rry returns ll k occurrences of P in O(m + log n) time, even though k cn e superliner in this ound. The reson is tht it gives pointer to list of the positions. This time ound cptures the fct tht if the user wnts to exmine the first k positions, this tkes O(k ) rther thn Θ(k) time. One wy to view this is tht it returns n itertor in O(m + log n) time tht then tkes O() time per position to return the positions. The position hep cn e implemented to hve this property lso, using depth-first serch tht mintins stck of ctive clls tht hve not yet mde recursive cll on their lst child. One use of such n itertor, however, is to exmine the first k positions in left-to-right order. This is common opertion in text editors, for exmple. This cn e implemented in O(log k) worst-cse time per element, due to the fct tht the node lels hve the hep property (Definition.). We illustrte how to produce n itertor tht returns them in right-to-left order; left-to-right order cn e otined y uilding the ugmented position hep for the reverse of the text. The positions of nodes on the indexing pth X tke O() time to check. If P = X, then the descendnts of X might lso hve to e returned in left-to-right order. Keep priority queue on the topmost nodes of the sutree of X whose positions hve not yet een returned. Becuse the positions hve the hep property (Definition.), the minimum position is mong these nodes. Initilly the priority queue hs X in it. Ech time new position is sked for, the minimum index i in the priority queue is returned, nd the positions in the children of the node contining i re inserted to the priority queue. Since Σ = O() nd the size of the sutree is O(k), the size of the priority queue is O(k), nd extrcting i nd inserting its children tkes O(log k) time. 7. Building the Position Hep in O(n) Time Ech time node is dded to the position hep, its prent must e locted so tht it cn e dded s child. The reson the ove lgorithm for constructing the position hep from the root does not tke O(n) time is tht indexing from the root to find this prent t ech itertion is not n O() opertion. 7.. The strtegy Indexing into the hep from the root is not the only wy to find the prent of the new node t step i. Let X i e the node dded t step i, let the first letter of T i e, nd let X i e the node dded t step i. Since X i is 4

15 prefix of T i, X i = Y, where Y is the prent of X i. By Lemm 7., elow, Y X i +, so Y is (not necessrily proper) ncestor of X i. This suggests the ide of serching upwrd from X i insted of downwrd from the root in order to find the prent Y of the new node t step i. Since Y cn e much shorter thn X i, the upwrd serch might hve to proceed through lrge numer of nodes on the pth from X i towrd the root efore Y is reched. However, the new node t step i, Y is then much shorter thn the node, X i, inserted t the previous itertion. The cost of the opertion is proportionl to the decrese in depth from one itertion to the next. Wht mkes the pproch more efficient thn the ove pproch is tht depth of the new node inserted t successive itertions cn grow y t most from one itertion to the next, y Lemm 7.. This llows us to mortize occsionl lrge costs incurred in itertions where the depth decreses y lrge mount over mny itertions where the depth slowly uilds up gin t the rte of one per itertion. The rgument is the sme s tht for stck with multipop opertion descried in the chpteramortized Anlysis in the textook []. 7.. Implementtion The following lemm is the sis of the clim tht the depth in the tree t which the lgorithm works must uild up gin slowly if there is sudden lrge nd costly decrese in the depth. Lemm 7.. If P is not node of H(T), it hs fewer thn P occurrences in T. Proof: Every suffix of T tht hs P s prefix results in new node of the tree tht is either proper prefix of P or tht hs P s prefix. Since P does not occur in the tree, it is not prefix of ny node in the tree. Therefore, the numer of suffixes of T tht hve P s prefix, hence the numer of occurrences of P, is ounded y the numer of proper prefixes of P. Let us sy tht set of S of strings is hereditry if, whenever X S, every sustring of X is lso in S. Lemm 7.. The nodes of the position hep re hereditry set of strings. For exmple, in the finl tree, node is leled with position of Figure 4. Its sustrings,,,,,,, nd the empty string re ll nodes of the position hep; they re leled with positions 0,,5,6,,,,, respectively. Proof: Let us show this y induction on the length of T i = t i t i...t. The lemm is trivilly true for H(T ), which hs only one node, the empty string. Otherwise, we dopt s the induction hypothesis tht the nodes of H(T i ) hve the hereditry property. Since H(T i ) differs from H(T i ) only y the ddition of node X, H(T i ) cn only fil to hve the hereditry property if some proper sustring of X fils to e node of T i. 5

16 T = Figure 5: The hereditry property doesn t necessrily pply when the suffixes re not inserted in order of scending length. The figure depicts the Coffmn nd Eve structure where the insertion order of the suffixes is (T, T 4, T, T 7, T 5, T 6, T ). String is the node leled with position 5, ut its sustring is not node of the tree. 6 This cn t e the cse if X <, since λ is node of H(T i ). Suppose X. We cn then write X s X. The prent of X is X, hence it is node of H(T i ). Since X is longer thn X, X is node in H(T i ) y the induction hypothesis. Also, X is prefix of T i, nd since X is node of T i, X is either dded t step i or is lredy node of T i. In either cse, it is node of H(T i ). We conclude tht X nd X re nodes of T i. By the induction hypothesis, every sustring of X nd X is node of T i, hence of T i, nd these re every proper sustring of the new node X = X. This hereditry property is not shred y ritrry instnces of Coffmn nd Eve s dt structure, s the node leled 9 in Figure is the string, ut its sustring is not node of the tree. It is not even true when the keys re the suffixes of text T when they re not inserted in scending order of length. Figure 5 gives n exmple. Lemm 7.. For < i T, if X i is the node inserted t step i nd X i is the node inserted t step i, then X i X i +. Proof: Let denote the first letter of T i. X i is the shortest prefix of T i tht is not lredy node of H(T i ) nd X i is the shortest such prefix of T i = T i. Let denote the lst letter of X i. Then X i cn e written s Y for some string Y. Suppose X i X i +. Then X i is proper prefix of Y. Since X i is the longest prefix of T i tht is not node of H(T i ), Y is not node of H(T i ). By the hereditry property, Y is node of H(T i ), since it is sustring of X i, which is node of H(T i ). The only new node dded to H(T i ) to get H(T i ) is Y, so Y ws lredy node of H(T i ), contrdiction. To insert node to the position hep, we must find the prent. Since inserting the node fter the prent is found tkes O() time, the only ostcle to getting liner time ound is repeted indexing into the position hep to find the prent of ech node to e dded. We must use n lterntive method to find this prent. 6

17 X i = is previously dded node + is not node + is not node + is not node + = is node = Y : the new node X i X i X i Figure 6: Given the node X i dded t step i, find the prent of the node X i dded t step i. The ide of our O(n) lterntive method is given in Figure 6. At step i = 4, we dd X i = s new node. At step i = 5, we must dd the shortest prefix of H(T i ) tht is not lredy node of the position hep. Let denote the first letter of T i. If the string does not lredy occur s node of the position hep, then it cn e dded s child of the root in O() time. Otherwise, s in the proof of Lemm 7., the new node must e Y for some prefix Y of the node X i dded in step i, where is the chrcter occurring Y + positions into T i. Below, we show how to find, for ech such prefix Y of X i, whether Y is lredy node of the position hep, nd if so, to return pointer to it, in O() time piece. We try this on ll proper prefixes of X i in descending order of length until we find the first. In the figure, we let Y tke on the sequence of vlues (,,,), whereupon it is discovered tht Y = is lredy node of the position hep, nd since the conctention of nd is not, is the longest prefix of T i tht is lredy node of the position hep. We hve found the desired prent of the new node. The new node, X i = Y, is dded s its child of Y on n edge leled with letter. This does not give n O() ound to dd ech node of the tree. However, we cn mortize the vrile costs, showing tht they sum to O(n) over ll itertions. The reson the cost of step i is not O() is tht we might hve to try mny prefixes Y of T i efore we find the one such tht Y is lredy node of the hep. Let the decrese in depth denote the difference X i X i of the depth of the node dded t position i nd the depth of the node dded t position i. If this is negtive, cll it n increse in depth. If t step i, we try k i prefixes efore finding Y such tht Y is lredy node of the tree, then we spent O(k i ) time on the step, nd X i = Y = X i (k i ). The decrese in depth is k i. The first two prefixes tke O() time, so the time spent t step i is O() plus the decrese in depth. By Lemm 7., the depth cn increse y t most t ech itertion, so the totl increse in depth is O(n) 7

18 position hep dul hep Figure 7: The position hep nd its dul for the text. The lels of the pth leding to node in the dul is the reverse of the lels of the pth leding to it in the position hep. over ll itertions. The totl decrese in depth cn t exceed the totl increse in depth, which mens tht over ll itertions, the totl decrese in depth is O(n). Therefore, the totl time spent y the lgorithm is no() + O(n) = O(n). It remins to descrie how to get n O() ound for finding, for ech prefix Y of X i, whether Y is lredy node of the hep. Definition 7.4. Let the dul D(T) of the position hep H(T) e the trie where for ech node X of H(T), the reverse X R of X is node of D(T). (see Figure 7). We continue to refer to ech node y its pth lel X in the position hep, even when considering it s node of the dul. Equivlently, ech node of D(T) is denoted y the sequence X of lels on edges from the node to the root of D(T). It is tempting to think tht the dul is just the position hep of the reverse of the text, ut it is esily verified tht this is not the cse. Lemm 7.5. The set of nodes of D(T) is the sme s the set of nodes of H(T). Proof: Becuse for every node X of H(T), there is node X in D(T), where X is the string of lels from the node to the root in D(T), every node of H(T) is node of D(T). It remins to show tht every node of D(T) is node of H(T). Let X e n ritrry node of H(T). By Lemm 7., not only is every prefix of node X of H(T) node of H(T), ut so is every suffix. This implies tht every ncestor of X in D(T) is node of H(T). There re no nodes on ny pth of D(T) tht fil to e node of H(T). We implement the position hep nd its dul on the sme set of nodes, so tht ech node hs oth prent in the position hep nd prent in the dul. We concurrently construct the position hep nd its dul. Suppose tht t step i we lredy hve H(T i ) nd D(T i ). We show how to updte oth to get H(T i ) nd D(T i ) in O(k i ) time. When going from H(T i ) to H(T i ), let e the first letter of T i nd X i the node dded t step i. (Refer to Figure 8.) The prefixes of Y in descending order of length re the ncestors encountered on the pth from X i to the root of the position hep. For ech such ncestor Y, we cn find whether Y is 8

19 Prents of X i = Y in position hep nd its dul X i = is previously dded node + is not node + is not node = Y + is node = Y : the new node. Y = "" X i = Y X i X i 9 4 X i X i 4 Figure 8: Implementing the lgorithm of Figure 6 using the position hep nd its dul. Strting t the previously-dded node X i, we find the lowest ncestor Y such tht Y is lredy node. This is ccomplished y trversing ncestors in the position hep, nd seeing if they hve child on edge leled in the dul. In this cse Y is the node leled 4. Its child on edge leled in the dul is Y, the node leled 5. It is the prent of the new node X i =. The lst prefix tried efore Y ws found is the longest node of the dul hep tht is prefix of X nd hs no child leled. It is the prent of X i in the dul. lredy node of the hep y determining whether Y hs child on n edge leled in the dul. This tkes O() time, since Y is oth node of the hep nd of the dul. We stop when we encounter the first one. By the ove lgorithm, this tkes cre of dding node X to H(T i ), yielding H(T i ) in O(k i ) time. However, we must lso dd this node to the dul, which requires locting its prent, X, nd dding it s child on edge leled. Fortuntely, X ws just the lst prefix of Y considered efore X ws discovered. We lredy found X in the position hep, nd since it is lso node of the dul, we hve it in the dul. X cn e dded s child of X on edge leled in O() dditionl time over wht we hve ccounted for in dding it in the position hep. This gives the following: Lemm 7.6. It tkes liner time to construct the position hep of text T. 8. Constructing the Augmented Position Hep in O(n) Time The ugmented position hep differs from the position hep in tht the nodes re leled with depth-first discovery nd finishing times nd with mximlrech pointers. Depth-first serch on tree with n nodes tkes O(n) time, so it 9

20 only remins to descrie how to compute the mximl-rech pointers in O(n) time. Once gin, the strtegy is to mortize the cost. The pproch is virtully the sme s it is for dding new nodes: insted of serching downwrd from the root t ech itertion, we serch upwrd in the tree, strting t the node pointed to y mximl-rech pointer t the previous itertion. Even though this is not n O() opertion, the cost is proportionl to the decrese in depth of the node pointed to y the mximl-rech pointer. This depth cn increse y t most from one itertion to the next, llowing to mortize lrge decreses in depth over mny smll increses in depth. Lemm 8.. For < i T, if X i is the node pointed to y the mximlrech pointer of node i nd X i y the mximl-rech pointer of node i, then X i X i +. Proof: Let denote the first letter of T i. X i is the longest prefix of T i tht is node of H(T), nd X i is the longest prefix of T i = T i tht is node of H(T). Let denote the lst letter of X i. Then X i cn e written s Y for some string Y. Suppose X i X i +. Then X i is proper prefix of Y nd Y is not node of H(T i ). By the hereditry property, Y is node of H(T i ), since it is sustring of X i, which is node of H(T i ). The only new node dded to H(T i ) to get H(T i ) is Y, so Y ws lredy node of H(T i ), contrdiction. To construct the ugmented position hep in O(n) time, our strtegy is first to construct the position hep in O(n) time using the lgorithm from the previous section. As efore, we crete the rry N[], where N[i] points to the node tht contins position i, nd this tkes O(n) time y trivil methods. We then dd the discovery nd finishing times nd the mximl-rech pointers on second pss, in O(n) time. We find nd test ech prefix y strting t X i in the position hep nd scending through ncestors until we find the first one, Y, tht hs child on edge leled in the dul hep. This child, Y, in the dul is the node to which node i must point. The nlysis of the liner running time is the sme s it is for liner-time construction of the position hep. The current depth is the depth of node X i in the position hep. The first two prefixes of X i tke O() time to check for child on edge leled in the dul hep. Ech dditionl prefix tkes O() time to check, nd decreses the current depth in the position hep. Cll this the vrile prt of the time spent t position i. By Lemm 8., the current depth cn increse y t most one per itertion. The initil depth is t most, since T hs length. The totl decrese in depth cn therefore e t most one greter thn the totl increse in depth, which is O() per itertion, hence O(n) overll. The sum of the vrile prts of the times spent t the different itertions is therefore O(n). We therefore get the following: 0

21 Lemm 8.. It tkes O(n) time to construct the ugmented position hep for text T of length n. 9. Updting the Position Hep when the Text is Edited When lock of chrcters is inserted to or deleted from text T, the position hep must pss through series of steps in which it is trie, ut hs some things wrong with it tht must e repired in order for it to e the position hep of the new text. The gol of this section is to give lgorithms for Delete nd Insert, which updte the position hep when lock of text is deleted from or inserted to the text T. Since the text is no longer sttic, it is no longer convenient to lel node of the position hep with its position numer in the text; when position is deleted, the position numers of ll letters to its left decrese y one. To void hving to updte the position-numer lels of ll those nodes, we insted lel the nodes with position pointers to the positions of the text. This requires us to define the nlog of the hep property when pointers, rther thn integers, re used. Definition 9.. If p is pointer to position in T, let T p denote the suffix of T tht egins t p. If X is node in the trie with pointer to position of T, let p(x) denote this pointer. The trie hs the hep property if whenever Y is child of X, p(y ) is to the left of p(x) in T. The pointer p(x) is correctly plced if X hs n occurrence t position p(x), tht is, if X is prefix of T p(x). The constructive definition of the position hep (4.) remins unchnged, except tht ech time position is inserted, the new node is leled with pointer to the position, rther thn its position numer. It will e convenient to look up the corresponding position-hep node given pointer to position in the text T. This is ccomplished y leling ech position p of the text with pointer N(p) to the node of the position hep tht points to it. This serves the sme function s the rry N[] in the sttic cse. To void the need to mention this pointer ech time we move pointer in the position hep, we will define the opertion of moving position p from one node to nother in the position hep s including the opertion of mking the pointer N(p) point to the new node. The following lemm is useful for estlishing tht procedure for updting the position hep fter n edit opertion on T hs correctly produced the position hep for the modified text. Lemm 9.. A trie H where ech node is leled with pointer to letter of text T is the position hep for T if nd only if it stisfies the following properties:. H hs the hep property;. Every position of T is pointed to y t most one pointer p(x) for some node X in the trie;

22 . Every position of T is pointed to y t lest one pointer p(x) for some node X of the trie; 4. For every node X, p(x) is correctly plced. Proof: By induction on the numer of positions inserted y the nive construction lgorithm. 9.. Deleting or inserting lock of text in T The workhorses of the lgorithm for updting the position hep fter insertion or removl of lock of text re Remove nd Add. Below, we explin how they work, ut for now, we define the prolems in terms of their preconditions nd postconditions so tht, for the time eing, we cn mke clls to them in our implementtion of Delete nd Insert. Definition 9.. The prolems solved y Remove nd Add An input to Remove or Add is trie tht stisfies properties properties nd of Lemm 9., ut might not stisfy properties nd 4. An dditionl input to Remove is node X tht contins position pointer to e removed from the set of position pointers in the trie. It removes the pointer without disrupting the hep property, without otherwise chnging the set of position pointers in the tree, nd without creting ny new violtions of property 4 t ny position pointers. An dditionl input to Add is position pointer to e inserted to the trie. The position pointer must not lredy occur in the trie. It correctly plces the pointer to without disrupting the hep property, without otherwise chnging the set of position pointers in the tree, nd without creting ny new violtions of property 4 t ny position pointers. A cll to Remove or Add must updte vrile h tht gives the current height of the trie. Implementtion requires shuffling position pointers in the tree in wy tht is fmilir to nyone who hs studied heps. Detils re given elow. In the mentime, given the prolems solved y Remove nd Add, we cn now explin the min procedures of the section, Delete nd Insert, in terms of clls to Remove nd Add. The Delete procedure updtes the position hep when lock of chrcters is deleted from the text so tht it is the position hep of the new text. Definition 9.4. An lgorithm for Delete Let h e the height of the input position hep. Cll Remove nd Add, using the modified text, on the h chrcters tht lie to the left of the deleted lock.

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

Lecture 10: Suffix Trees

Lecture 10: Suffix Trees Computtionl Genomics Prof. Ron Shmir, Prof. Him Wolfson, Dr. Irit Gt-Viks School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' רון שמיר, פרופ' חיים וולפסון, דר' עירית גת-ויקס ביה"ס למדעי

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

From Indexing Data Structures to de Bruijn Graphs

From Indexing Data Structures to de Bruijn Graphs From Indexing Dt Structures to de Bruijn Grphs Bstien Czux, Thierry Lecroq, Eric Rivls LIRMM & IBC, Montpellier - LITIS Rouen June 1, 201 Czux, Lecroq, Rivls (LIRMM) Generlized Suffix Tree & DBG June 1,

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

Suffix trees. December Computational Genomics

Suffix trees. December Computational Genomics Computtionl Genomics Prof Irit Gt-Viks, Prof. Ron Shmir, Prof. Roded Shrn School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' עירית גת-ויקס, פרופ' רון שמיר, פרופ' רודד שרן ביה"ס למדעי

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018 Premster Course Algorithms Chpter 6: Shortest Pths Christin Scheieler SS 8 Bsic Grph Algorithms Overview: Shortest pths in DAGs Dijkstr s lgorithm Bellmn-For lgorithm Johnson s metho SS 8 Chpter 6 Shortest

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC0 Introduction to Dt Structures Lecture 7 Queues, Priority Queues Queues I A queue is First-In, First-Out = FIFO uffer e.g. line-ups People enter from the ck of the line People re served (exit) from

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

2-3 search trees red-black BSTs B-trees

2-3 search trees red-black BSTs B-trees 2-3 serch trees red-lck BTs B-trees 3 2-3 tree llow 1 or 2 keys per node. 2-node: one key, two children. 3-node: two keys, three children. ymmetric order. Inorder trversl yields keys in scending order.

More information

Lists in Lisp and Scheme

Lists in Lisp and Scheme Lists in Lisp nd Scheme Lists in Lisp nd Scheme Lists re Lisp s fundmentl dt structures, ut there re others Arrys, chrcters, strings, etc. Common Lisp hs moved on from eing merely LISt Processor However,

More information

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997.

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997. Forced convex n-gons in the plne F. R. K. Chung y University ofpennsylvni Phildelphi, Pennsylvni 19104 R. L. Grhm AT&T Ls - Reserch Murry Hill, New Jersey 07974 Mrch 2,1997 Astrct In seminl pper from 1935,

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation Union-Find Problem Given set {,,, n} of n elements. Initilly ech element is in different set. ƒ {}, {},, {n} An intermixed sequence of union nd find opertions is performed. A union opertion combines two

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

I/O Efficient Dynamic Data Structures for Longest Prefix Queries

I/O Efficient Dynamic Data Structures for Longest Prefix Queries I/O Efficient Dynmic Dt Structures for Longest Prefix Queries Moshe Hershcovitch 1 nd Him Kpln 2 1 Fculty of Electricl Engineering, moshik1@gmil.com 2 School of Computer Science, himk@cs.tu.c.il, Tel Aviv

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers Mth Modeling Lecture 4: Lgrnge Multipliers Pge 4452 Mthemticl Modeling Lecture 4: Lgrnge Multipliers Lgrnge multipliers re high powered mthemticl technique to find the mximum nd minimum of multidimensionl

More information

1.5 Extrema and the Mean Value Theorem

1.5 Extrema and the Mean Value Theorem .5 Extrem nd the Men Vlue Theorem.5. Mximum nd Minimum Vlues Definition.5. (Glol Mximum). Let f : D! R e function with domin D. Then f hs n glol mximum vlue t point c, iff(c) f(x) for ll x D. The vlue

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup Regulr Expression Mtching with Multi-Strings nd Intervls Philip Bille Mikkel Thorup Outline Definition Applictions Previous work Two new problems: Multi-strings nd chrcter clss intervls Algorithms Thompson

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers? 1.1 TEXAS ESSENTIAL KNOWLEDGE AND SKILLS Prepring for 2A.6.K, 2A.7.I Intervl Nottion nd Set Nottion Essentil Question When is it convenient to use set-uilder nottion to represent set of numers? A collection

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline CSCI1950 Z Comput4onl Methods for Biology Lecture 2 Ben Rphel Jnury 26, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Outline Review of trees. Coun4ng fetures. Chrcter bsed phylogeny Mximum prsimony Mximum

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X 4. Mon, Sept. 30 Lst time, we defined the quotient topology coming from continuous surjection q : X! Y. Recll tht q is quotient mp (nd Y hs the quotient topology) if V Y is open precisely when q (V ) X

More information

Meaningful Change Detection in Structured Data.

Meaningful Change Detection in Structured Data. Meningful Chnge Detection in Structured Dt Sudrshn S. Chwthe Hector Grci-Molin Computer Science Deprtment, Stnford University, Stnford, Cliforni 94305 fchw,hectorg@cs.stnford.edu Astrct Detecting chnges

More information

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012 Fculty of Mthemtics Wterloo, Ontrio N2L 3G1 Grde 7/8 Mth Circles Geometric Arithmetic Octoer 31, 2012 Centre for Eduction in Mthemtics nd Computing Ancient Greece hs given irth to some of the most importnt

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

Orthogonal line segment intersection

Orthogonal line segment intersection Computtionl Geometry [csci 3250] Line segment intersection The prolem (wht) Computtionl Geometry [csci 3250] Orthogonl line segment intersection Applictions (why) Algorithms (how) A specil cse: Orthogonl

More information