Efficient implementation of lazy suffix trees

Size: px
Start display at page:

Download "Efficient implementation of lazy suffix trees"

Transcription

1 SOFTWARE PRACTICE AND EXPERIENCE Softw. Prct. Exper. 2003; 33: (DOI: /spe.535) Efficient implementtion of lzy suffix trees R. Giegerich 1,S.Kurtz 2 nd J. Stoye 1,, 1 Fculty of Technology, University of Bielefeld, Bielefeld, Germny 2 Centre for Bioinformtics, University of Hmurg, Bundesstrße 43, Hmurg, Germny SUMMARY We present n efficient implementtion of write-only top-down construction for suffix trees. Our implementtion is sed on new, spce-efficient representtion of suffix trees tht requires only 12 ytes per input chrcter in the worst cse, nd 8.5 ytes per input chrcter on verge for collection of files of different type. We show how to efficiently implement the lzy evlution of suffix trees such tht sutree is evluted only when it is trversed for the first time. Our experiments show tht for the prolem of serching mny exct ptterns in fixed input string, the lzy top-down construction is often fster nd more spce efficient thn other methods. Copyright c 2003 John Wiley & Sons, Ltd. KEY WORDS: string mtching; suffix tree; spce-efficient implementtion; lzy evlution INTRODUCTION Suffix trees re efficiency oosters in string processing. The suffix tree of text t is n index structure tht cn e computed nd stored in O( t ) time nd spce. Once constructed, it llows ny sustring w of t to e locted in O( w ) steps, independent of the size of t. This instnt ccess to sustrings is most convenient in myrid [1] of situtions, nd in Gusfield s ook [2], out 70 pges re devoted to pplictions of suffix trees. While suffix trees ply prominent role in lgorithmics, their prcticl use hs not een s widespred s one should expect (for exmple, Skien [3] hs oserved tht suffix trees re the dt structure with the highest need for etter implementtions). The following prgmtic considertions mke them pper less ttrctive: Correspondence to: J. Stoye, AG Genominformtik, Fculty of Technology, Universität Bielefeld, Bielefeld, Germny. E-mil: stoye@techfk.uni-ielefeld.de Contrct/grnt sponsor: Deutsche Forschungsgemeinschft; contrct/grnt numer: Ku 1257/1-1 Pulished online 25 June 2003 Copyright c 2003 John Wiley & Sons, Ltd. Received 26 Novemer 2001 Revised 4 Novemer 2002 Accepted 8 Mrch 2003

2 1036 R. GIEGERICH, S. KURTZ AND J. STOYE The liner-time constructions y Weiner [4], McCreight [5] nd Ukkonen [6] re quite intricte to implement. (See lso [7], which reviews these methods nd revels reltionships much closer thn one would think.) Although symptoticlly optiml, their poor loclity of memory reference [8] cuses significnt loss of efficiency on cched processor rchitectures. Although symptoticlly liner, suffix trees hve reputtion of eing greedy for spce. For exmple, the representtion of McCreight [5] requires 28 ytes per input chrcter in the worst cse. Due to these fcts, for mny pplictions, the construction of suffix tree does not mortize. For exmple, if text is to e serched only for very smll numer of ptterns, then it is usully etter to use fst nd simple online method, such s the Boyer Moore Horspool lgorithm [9], to serch the complete text new for ech pttern. However, these concerns re llevited y the following recent developments: In [8], Giegerich nd Kurtz dvocte the use of write-only, top-down construction, referred to here s the wotd-lgorithm. Although its time efficiency is O(nlog n) in the expected cse nd even O(n 2 ) in the worst cse (for text of length n), it is competitive in prctice, due to its simplicity nd good loclity of memory reference. In [10], Kurtz developed spce-efficient representtion tht llows suffix trees to e computed in liner time in 46% less spce thn previous methods. As consequence, suffix trees for lrge texts, e.g. complete genomes, hve een proved to e mngele; see for exmple [11]. The question of mortizing the cost of suffix tree construction is lmost eliminted y incrementlly constructing the tree s demnded y its queries. This possiility ws lredy hinted t in [8], where the wotd-lgorithm ws clled lzytree for this reson. When implementing the wotd-lgorithm in lzy functionl progrmming lnguge, the suffix tree utomticlly ecomes lzy dt structure, ut of course, the generl overhed of using lzy lnguge is incurred. In the present pper, we explin how lzy nd n eger version of the wotd-lgorithm cn efficiently e implemented in n impertive lnguge. Our implementtion technique voids constnt lphet fctor in the running time. It is sed on new spce efficient suffix tree representtion, which requires only 12n ytes of spce in the worst cse. This is n improvement of 8n ytes over the most spce efficient previous representtion, s developed in [10]. Other recently pulished suffix tree representtions require 17n ytes [14] nd 21n ytes [15]. Theoreticl developments in [16,17] descrie suffix tree representtions tht use only n log 2 n + O(n) its of spce. Another one is the compressed suffix tree of [18], which requires only O(n) its of spce. However, these dt structures re confined to text serching nd, to the est of our knowledge, no implementtion exists sed on these techniques. There re other spce-efficient dt structures for string pttern mtching. The suffix rry of [12] requires 9n ytes, including the spce for construction. The level compressed trie of [19] tkes The suffix rry construction of [12] nd the liner time suffix tree construction of [13] lso do not hve the lphet fctor in their running times. For the liner time suffix tree constructions of [4 6] the lphet fctor cn e voided y employing hshing techniques, see [5]; however, t the cost of using considerly more spce, see [10]. Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

3 LAZY SUFFIX TREES 1037 out 12n ytes. The suffix inry serch tree of [20] requires 10n ytes. The suffix cctus of [21] cn e implemented in 10n ytes, nd the suffix orcle [22] in8n ytes. Finlly, the PT-tree of [23] requires n log 2 n + O(n) its. While some of these dt structures require less spce thn suffix trees, we expect tht for moderte numer of pttern serches our lzy construction wotdlzy is more spce efficient thn these methods. Moreover, some of these index structures cn only e computed efficiently y first constructing the complete suffix tree. Experimentl results show tht our implementtion technique leds to progrms tht re superior to previous ones in mny situtions. For exmple, when serching 0.01n ptterns of length etween 10 nd 20 in text of length n, the lzy wotd-lgorithm (wotdlzy, for short) is on verge 43% fster nd lmost 50% more spce efficient thn linked list implementtion of McCreight s liner time suffix tree lgorithm [5]. It is 38% fster nd 65% more spce efficient thn hsh tle implementtion of McCreight s liner time suffix tree lgorithm, ten times fster nd 42% more spce efficient thn progrmsed on suffix rrys [12], nd 40 times fster thn the iterted ppliction of the Boyer Moore Horspool lgorithm [9]. The lzy wotd-lgorithm mkes suffix trees lso pplicle in contexts where the expected numer of queries to the text is smll reltive to the length of the text, with n lmost immesurle overhed compred with its eger vrint wotdeger if the complete tree is uilt. In ddition to its usefulness for serching string ptterns, wotdlzy is interesting for other prolems (see the list in [2]), such s exct set mtching, the sustring prolem for dtse of ptterns, the DNA contmintion prolem, common sustrings of more thn two strings, circulr string lineriztion, or computtion of the q-word distnce of two strings. Documented source code of the progrms wotdeger nd wotdlzy is ville t A preliminry version of this pper ppered in [24]. THE WOTD SUFFIX TREE CONSTRUCTION Terminology Let e finite ordered set of size k, thelphet. is the set of ll strings over, ndε is the empty string. Weuse + to denote the set \{ε} of non-empty strings. We ssume tht t is string over of length n 1ndtht is chrcter not occurring in t. Fornyi [1,n+ 1], let s i = t i...t n denote the ith non-empty suffix of t. A + -tree T is finite rooted tree with edge lels from +. For ech, every node α in T hs t most one -edge α v β for some string v nd some node β. An edge leding to lef is lef edge. Letα e node in T. We denote α y the pth tht leds us there. Formlly, α is denoted y w if nd only if w is the conctention of the edge lels on the pth from the root to α. ε is the root. Astrings occurs in T if nd only if T contins node sv, for some string v. Thesuffix tree for t, denoted y ST(t), isthe + -tree T with the following properties: (i) ech node is either lef or rnching node, nd (ii) string w occurs in T if nd only if w is sustring of t. There is one-to-one correspondence etween the non-empty suffixes of t nd the leves of ST(t). Wedefinethelef set l(α) of node α s follows. For lef s j, l(s j ) ={j}. For rnching node u, l(u) ={j u v uv is n edge in ST(t), j l(uv)}. Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

4 1038 R. GIEGERICH, S. KURTZ AND J. STOYE A review of the wotd-lgorithm The wotd-lgorithm dheres to the recursive structure of suffix tree. The ide is tht for ech rnching node u the sutree elow u is determined y the set of ll suffixes of tththveu s prefix. In other words, if we hve the set R(u) := {s us is suffix of t} of remining suffixes ville, we cn evlute the node u. This works s follows: t first R(u) is divided into groups ccording to the first chrcter of ech suffix. For ny chrcter c, letgroup(u, c) := {w cw R(u)} e the c-group of R(u). Ifforsomec, group(u, c) contins only one string w, then there is lef edge leled cw outgoing from u. Ifgroup(u, c) contins t lest two strings, then there is n edge leled cv leding to rnching node ucv,wherev is the longest common prefix (lcp, for short) of ll strings in group(u, c). The child ucv cn then e evluted from the set R(ucv) ={w vw group(u, c)} of remining suffixes. The wotd-lgorithm strts y evluting the root from the set R(root) of ll suffixes of t. All nodes of ST(t) cn e evluted recursively from the corresponding set of remining suffixes in top-down mnner. For exmple, consider the input string t =. The wotd-lgorithm for t works s follows: At first, the root is evluted from the set R(root) of ll non-empty suffixes of the string t, see the first six columns in Figure 1. The lgorithm recognizes three groups of suffixes. The -group, the -group, nd the -group. The -group contins two nd the -group contins three suffixes, hence we otin two unevluted rnching nodes, which re reched y n -edgendy-edge. The -group is singleton, so we otin lef reched y n edge leled. To evlute the unevluted rnching node corresponding to the -group, one first computes the longest common prefix of the remining suffixes of tht group. This is in our cse. So the -edge from the root is leled y, nd the remining suffixes nd re divided into groups ccording to their first chrcter. We otin two singleton groups of suffixes, nd thus two lef edges outgoing from. These lef edges re leled s nd. The unevluted rnching node corresponding to the -group is evluted recursively in similr wy, see Figure 1. Properties of the wotd-lgorithm The distinctive property of the wotd-lgorithmis tht the construction proceeds top-down. Once node hs een constructed, it need not e revisited in the construction of other prts of the tree (unlike the liner-time constructions of [4 6,13]). As the order of sutree construction is otherwise independent, it my e rrnged in demnd-driven fshion, otining the lzy implementtion detiled in the next section. The top-down construction hs een mentioned severl times in the literture [8,19,25,26],uttfirst glnce, its worst cse running time of O(n 2 ) is disppointing. However, the expected running time is O(nlog k n) (see e.g. [8]), nd experiments in [8] suggest tht the wotd-lgorithm is prcticlly liner for moderte size strings. This cn e explined y the good loclity ehvior: the wotd-lgorithm hs optiml loclity on the tree dt structure. In principle, no more thn current pth of the tree need e in memory. With respect to text ccess, the wotd-lgorithm lso ehves very well: for ech sutree, only the corresponding remining suffixes re ccessed. At certin tree level, the numer of suffixes considered will e smller thn the numer of ville cche entries. As these suffixes re red sequentilly, prcticlly no further cche misses will occur. This point is reched erlier when Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

5 LAZY SUFFIX TREES 1039 ) z } R(root ) z } R() z } R() ) ) z } R() Figure 1. The write-only top-down construction of ST(). the rnching degree of the tree nodes is higher, since the suffixes split up more quickly. Hence, the loclity of the wotd-lgorithm improves for lrger lphets. Aside from the liner constructions lredy mentioned, there re O(nlog n) time suffix tree constructions (e.g. [26,27]), which re sed on Hopcroft s prtitioning technique [28]. While these constructions re fster in terms of worst-cse nlysis, the sutrees re not constructed independently. Hence they do not shre the loclity of the wotd-lgorithm, nor do they llow for lzy implementtion. IMPLEMENTATION TECHNIQUES This section descries how the wotd-lgorithm cn e implemented in n eger lnguge. The simultion of lzy evlution in n eger lnguge is not very common pproch. Unevluted prts of the dt structure hve to e represented explicitly, nd the trversl of the suffix tree ecomes more complicted ecuse it hs to e merged with the construction of the tree. We will show, however, tht y creful considertion of efficiency mtters, one cn end up with progrm tht is not only Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

6 1040 R. GIEGERICH, S. KURTZ AND J. STOYE more efficient nd flexile in specil pplictions, ut the performnce of which is comprle to the est existing implementtions of index-sed exct string mtching lgorithms in generl. We first descrie the dt structure tht stores the suffix tree, nd then we show how to implement the lzy nd eger evlution, including the dditionl dt structures. The suffix tree dt structure To implement suffix tree, we siclly hve to represent three different items: nodes, edges, nd edge lels. To descrie our representtion, we define totl order on the children of rnching node: let uv nd uw e two different nodes in ST(t) tht re children of the sme rnching node u. Then uv uw if nd only if min l(uv) < min l(uw). Note tht lef sets re never empty nd l(uv) l(uw) =. Hence is well defined. Let us first consider how to represent the edge lels. Since n edge lel v is sustring of t, it cn e represented y pir of pointers (i, j) into t = t, such tht v = t i...t j (see Figure 2). In cse the edge is lef edge, we hve j = n + 1, i.e., the right pointer j is redundnt. In cse the edge leds to rnching node, it lso suffices to store only left pointer, if we choose it ppropritely (see Figure 3): let u v uv e n edge in ST(t). Wedefinelp(uv) := min l(uv) + u, theleft pointer of uv. Now suppose tht uv is rnching node nd i = lp(uv). Assume furthermore tht uvw is the smllest child of uv w.r.t. the reltion. Hence we hve min l(uv) = min l(uvw), nd thus lp(uvw) = min l(uvw) + uv = min l(uv) + u + v = lp(uv) + v. Nowletr = lp(uvw). Then v = t i...t i+ v 1 = t i...t lp(uv)+ v 1 = t i...t lp(uvw) 1 = t i...t r 1. In other words, to retrieve edge lels in constnt time, it suffices to store the left pointer for ech node (including the leves). For ech rnching node u we dditionlly need constnt time ccess to the child of uv with the smllest left pointer. This ccess is provided y storing reference firstchild(u) to the first child of u w.r.t.. Thelp- ndfirstchild-vlues re stored in single integer tle T. The vlues for children of the sme node re stored in consecutive positions ordered w.r.t.. Thus, only the edges to the first child re stored explicitly. The edges to ll other children re implicit. They cn e retrieved y scnning consecutive positions in tle T. Any node u is referenced y the index in tle T where lp(u) is stored. To decode the suffix tree representtion, we need two extr its: lef it mrks n entry in T corresponding to lef, nd rightmost child it mrks n entry corresponding to node tht does not hve right rother w.r.t.. Figure 4 shows tle T representing ST(). The evlution process The wotd-lgorithm is est viewed s process evluting the nodes of the suffix tree, strting t the root nd recursively proceeding downwrds into the sutrees. We first descrie how n unevluted node u of ST(t) is stored. For the evlution of u, we need ccess to the set R(u) of remining suffixes. Therefore we employ glol rry suffixes tht contins pointers to suffixes of t. For ech unevluted node u, there is n intervl in suffixes tht stores strting positions of suffixes in R(u), ordered y descending suffix-length from left to right. R(u) is then represented y the two oundries left(u) nd right(u) of the corresponding intervl in suffixes. The oundries re stored in the two integers reserved in tle T for the rnching node u. To distinguish evluted nd unevluted nodes, we use third it, the unevluted it. Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

7 LAZY SUFFIX TREES 1041 (2; 3) (1; 1) (6; 6) (4; 6) (6; 6) (4; 6) (2; 3) (6; 6) (6; 6) Figure 2. The finl suffix tree ST() from Figure 1 with edge lels s pirs (left pointer, right pointer). ST(t) u t u i v r w (i; r 1) uv (r; ) uvw Figure 3. The definition of lp(uv). Now we cn descrie how u is evluted: the edges outgoing from u re otined y simple counting sort [29], using the first chrcter of ech suffix stored in the intervl [left(u), right(u)] of the rry suffixes s the key in the counting phse. Ech chrcter c with count greter thn zero corresponds to c-edge outgoing from u. Moreover, the suffixes in the c-group determine the sutree elow tht edge. The pointers to the suffixes of the c-group re stored in suintervl, in descending order of their length. To otin the complete lel of the c-edge, the lcp of ll suffixes in the c-group is computed. If the c-group contins just one suffix s, then the lcp is s itself. If the c-group contins more thn one suffix, then simple loop tests for equlity of the chrcters t suffixes[i]+j for j = 1, 2,... nd Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

8 1042 R. GIEGERICH, S. KURTZ AND J. STOYE lef it 0 lst child it! 0 z } z } z } z } z } z } z } z } z } Figure 4. A tle T representing ST() (see Figure 2). The input string s well s T re indexed strting with 1. A rnching node occupies two tle entries, lef occupies one tle entry. The first vlue for rnching node u is lp(u), the second is firstchild(u), indicted lso y rrows elow the tle. The lef it nd rightmost (lst) child it re indicted in the first tle entry of ech node. for ll strt positions i of the suffixes in the c-group. As soon s n inequlity is detected, the loop stops nd j is the length of the lcp of the c-group. The children of u re stored in tle T, one for ech non-empty group. A group with count one corresponds to suintervl of width one. It leds to lef, sy s, for which we store lp(s) in the next ville position of tle T. lp(s) is given y the left oundry of the group. A group of size lrger thn one leds to n unevluted rnching node, sy v, for which we store left(v) nd right(v) in the next two ville positions of tle T. In this wy, ll nodes with the sme prent u re stored in consecutive positions. Moreover, since the suffixes of ech intervl re in descending order of their length, the children re ordered w.r.t. the reltion. Thevluesleft(v) nd right(v) re esily otined from the counts in the counting sort phse, nd setting the lef-it nd the rightmost-child it is strightforwrd. To prepre for the (possile) evlution of v, the vlues in the intervl [left(v),right(v)] of the rry suffixes re incremented y the length of the corresponding lcp. Finlly, fter ll successor nodes of u recreted, thevluesofleft(u) ndright(u) in T rereplcedy theintegerslp(u) := suffixes[left(u)] nd firstchild(u), nd the unevluted it for u is deleted. The nodes of the suffix tree cn e evluted in n ritrry order respecting the prent/child reltion. Two strtegies re relevnt in prctice: the eger strtegy evlutes nodes in depth-first nd left-to-right trversl, s long s there re unevluted nodes remining. The progrm implementing this strtegy is clled wotdeger in the sequel. The lzy strtegy evlutes node only when the corresponding sutree is trversed for the first time, for exmple y procedure serching for ptterns in the suffix tree. The progrm implementing this strtegy is clled wotdlzy in the sequel. Spce requirement The suffix tree representtion s descried ove requires 2q + n integers, where q is the numer of non-root rnching nodes. Since q = n 1 in the worst cse, this is n improvement of 2n integers over the est previous representtion, s descried in [10]. However, one hs to e creful when compring the 2q + n integers representtion ove with the results of [10]. The 2q + n integers representtion Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

9 LAZY SUFFIX TREES 1043 is tilored for the wotd-lgorithm nd requires extr working spce of 2.5n integers in the worst cse : The rry suffixes contins n integers, nd the counting sort requires uffer of the width of the intervl tht is to e sorted. In the worst cse, the width of this intervl is n 1. Moreover, wotdeger needs stck of size up to n/2, to hold references to unevluted nodes. Creful memory mngement, however, llows spce to e sved in prctice. Note tht during eger evlution, the rry suffixes is processed from left to right, i.e., it contins completely processed prefix. Simultneously, the spce requirement for the suffix tree grows. By recliming the completely processed prefix of the rry suffixes for the tle T, the extr working spce required y wotdeger is only little more thn one yte per input chrcter, see Tle I.Forwotdlzy, it is not possile to reclim unused spce of the rry suffixes, since this is processed in n ritrry order. As consequence, wotdlzy needs more working spce. EXPERIMENTAL RESULTS For our experiments, we collected set of nine files of different sizes nd types. We restricted ourselves to 7-it ASCII files, since the suffix tree ppliction we consider (serching for string ptterns) is not common for inry files. Our collection consists of the following files. We used three files from the Clgry Corpus: i, ook1, ook2. The first one contins forml text (iliogrphic items), nd the ltter two contin English text. We dded three files (contining English text) from the Cnterury Corpus: lice29, lcet10,ndplrn12. Finlly, we dded three lrge files : E.coli (the complete genome of the cteri Escherichi coli), ile (the Bile), nd world (the Project Gutenerg Edition of The World Fctook 1992). All progrmswe considerre written in C. We used the gcc compiler, version 2.96, with optimizing option O3. The progrms were run on computer with MHz AMD Athlon XP processor, 512 MB RAM, under Linux. On this computer ech integer nd ech pointer occupies 4 ytes. As consequence, we hve 30 its to store left(u) or lp(u) for rnching node or lef u. Moreover, we hve 31 its to store right(u) or firstchild(u) for rnching node u. left(u), right(u), nd lp(u) re in the rnge [0,n]. firstchild(u) is in the rnge [0, 3n]. Hence 3n muste stisfied, i.e. the mximl length of the input string is In first experiment we rn three different progrms constructing suffix trees: wotdeger, mccl, nd mcch. The ltter two implement McCreight s suffix tree construction [5]. mccl computes the improved linked list representtion, nd mcch computes the improved hsh tle representtion of the suffix tree, s descried in [10]. Tle I shows the running times nd the spce requirements. We normlized w.r.t. the length of the files. Tht is, we show the reltive time (in seconds) to process 10 6 chrcters (i.e., rtime = (10 6 time)/n), nd the reltive spce requirement in ytes per input chrcter. For wotdeger we show the spce requirement for the suffix tree representtion (stspce), Moreover, the wotd-lgorithm does not run in liner worst cse time, in contrst to e.g. McCreight s lgorithm [5] which cn e used to construct the 5n integers representtions of [10] in constnt working spce. It is not cler to us whether it is possile to construct the 2q + n representtion of this pper within constnt working spce, or in liner time. In prticulr, it is not possile to construct it with McCreight s [5] or with Ukkonen s lgorithm [6], see [10]. These files, like the two corpor, cn e otined from Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

10 1044 R. GIEGERICH, S. KURTZ AND J. STOYE Tle I. Time nd spce requirement for different progrms constructing suffix trees. wotdeger mccl mcch File n k rtime stspce spce rtime spce rtime spce i ook ook lice lcet plrn E.coli ile world Averge s well s the totl spce requirement including the working spce. mccl nd mcch only require constnt extr working spce. The lst row of Tle I shows the verges of the vlues of the corresponding columns. In ech row, entries in old fce mrk the smllest reltive time nd the smllest reltive spce requirement, respectively. All three progrms hve similr verge running times. wotdeger nd mcch showmorestle running time thn mccl. This my e explined y the fct tht the running time of wotdeger nd mcch is independent of the lphet size. For thorough explntion of the ehvior of mccl nd mcch we refer to [10]. wotdeger gives us runtime dvntge on dt sets smller thn 10 6 chrcters, while for lrger dt, the extr logrithmic fctor in its symptotic construction time ecomes visile grdully. mccl, plgued y its extr lphet fctor k, wins the rce only for n = nd k = 4. mcch is est for lrge dt sizes in connection with lrge lphets. In ny cse, wotdeger is more spce efficient thn the other progrms, using 0.84 nd 5.54 ytes per input chrcter less thn mccl nd mcch, respectively. Note tht the dditionl working spce required for wotdeger is on verge only 1.06 ytes per input chrcter. As lredy oserved in [10], suffix trees for DNA sequences re lrger thn for other types of input strings. This is due to the smll lphet, which leds to denser suffix tree with more rnching nodes. This effect cn e confirmed here for ll lgorithms, compring spce requirements for inputs E.coli nd ile. In second experiment we studied the ehvior of different progrms serching for mny exct ptterns in n input string, scenrio tht occurs for exmple in genome-scle sequencing projects, see [2] (Section7.15). For the progrmsof the previousexperiment, ndforwotdlzy, we implemented serch functions. wotdeger nd mccl require O(km) time to serch for pttern string of length m. mcch requires O(m) time. Since the pttern serch for wotdlzy is merged with the evlution of suffix tree nodes, one cnnot mke generl sttement out the running time of the serch. We lso considered suffix rrys, using the originl progrm code developed y Mner nd Myers [12] (p. 946). The suffix rry progrm, referred to y mmy, constructs suffix rry in O(nlog n) Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

11 LAZY SUFFIX TREES 1045 Tle II. Time nd spce requirement for serching 0.01n exct ptterns. wotdlzy mmy wotdeger mccl mcch mh File n k rtime stspce spce rtime rtime rtime rtime spce rtime i ook ook lice lcet plrn E.coli ile world Averge Construction time Serch time time. Serching is performed in O(m + log n) time. The suffix rry requires 5n ytes of spce. For the construction, dditionlly 4n ytes of working spce re required. Finlly, we lso considered the iterted ppliction of n on-line string serching lgorithm, our own implementtion of the Boyer Moore Horspool lgorithm [9], referred to y mh. The lgorithm tkes O(n + m) expected time per serch, nd uses O(m) working spce. We generted ptterns ccording to the following strtegy: for ech input string t of length n we rndomly smpled ρn sustrings s 1,s 2,...,s ρn of different lengths from t nd tested if they occurred in t or not. The proportionlity fctor ρ ws etween nd 1. The lengths of the sustrings were evenly distriuted over the intervl [10, 20]. For i [1,ρn], the progrms were clled to serch for pttern p i,wherep i = s i,ifi is even, nd p i is the reverse of s i, otherwise. The reson for reversing the pttern string s i in hlf of the cses is to simulte the fct tht pttern serch is often unsuccessful. Tle II shows the reltive running times for ρ = For wotdlzy we show the spce requirement for the suffix tree fter ll ρn pttern serches hve een performed (stspce), nd the totl spce requirement. For mmy, mh, nd the other three progrms the spce requirement is independent of ρ. Thus for the spce requirement of wotdeger, mccl, nd mcch see Tle I. The spce requirement of mh is mrginl, so it is omitted in Tle II. Except for the DNA sequence, wotdlzy is the fstest nd most spce efficient progrm for ρ = This is due to the fct tht the pttern serches only evlute prt of the suffix tree. Compring the stspce columns of Tles I nd II we cn estimte tht for ρ = 0.01 only out 10% of the suffix tree is evluted. Of course, for the other progrms, their tree construction time domintes the serch time. We cn lso deduce tht wotdeger performs pttern serches fster thn mccl. This cn e explined s follows: serching for ptterns mens tht for ech rnching node the list of successors is trversed, to find prticulr edge. However, in our new suffix tree representtion, the successors Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

12 1046 R. GIEGERICH, S. KURTZ AND J. STOYE 10 mmy mccl mcch wotdeger wotdlzy mh Figure 5. Averge reltive running time (in seconds) for different vlues of ρ [0, 0.002]. re found in consecutive positions of tle T. This mens smll numer of cche misses, nd hence the good performnce. It is remrkle tht wotdlzy is more spce efficient nd ten times fster thn mmy. Of course, the spce dvntge of wotdlzy is lost with lrger numer of pttern serches. In prticulr, for ρ 0.3 mmy is the most spce efficient progrm (dt not shown). Figures 5 nd 6 give generl overview, showing how ρ influences the running times. Figure 5 shows the verge reltive running time for ll progrms nd different choices of ρ for ρ Figure 6 shows the vergereltiverunningtime for ll progrmsexceptmh for ll vlues of ρ. We oservetht wotdlzy is the fstest progrm for ρ 0.05, nd wotdeger is the fstest progrm for ρ mh is fster thn wotdlzy only for ρ< Thus the index construction performed y wotdlzy lredy mortizes for very smll numer of pttern serches. We lso performed pttern serches on two very lrge iologicl sequence files sprot39 (relese 39 of the SWISSPROT protein dtse) nd yest (the completegenomeof the ker s yest Scchromyces cerevisie). The results re shown in Tle III. We left out mh, which would tke extremely long running times, nd we could lso not test mcch since this progrm cn only process files of length up to chrcters. Our min oservtions re: Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

13 LAZY SUFFIX TREES mmy mccl mcch wotdeger wotdlzy Figure 6. Averge reltive running time (in seconds) for different vlues of ρ [0, 1]. Tle III. Time nd spce requirements when serching 0.01n ptterns in very lrge files. wotdlzy wotdeger mccl mmy File n k rtime stspce spce rtime stspce spce rtime spce rtime spce sprot yest The reltive running time for ll progrms clerly increses. With ρ pproching 1, the slower suffix tree construction of wotdeger nd wotdlzy is compensted for y fster pttern serch procedure, so tht there is running time dvntge over mccl (dt not shown). The execution time for mmy rises shrply on our lrgest dt set. Here, for serching 0.01n ptterns for n = , the extr logrithmic fctor in the serch procedure of mmy ecomes visile. Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

14 1048 R. GIEGERICH, S. KURTZ AND J. STOYE CONCLUSION We hve developed efficient lzy nd eger implementtions of the write-only top-down suffix tree construction. These construct representtion of the suffix tree tht requires only 12n ytes of spce in the worst cse, plus 10n ytes of working spce. The totl spce requirement in prctice, though, is only 9.38n ytes on verge for collection of files of different types. The time nd spce overhed of the lzy implementtion is very smll. Our experiments show tht for serching mny exct ptterns in n input string, the lzy lgorithm is the most spce nd time efficient lgorithm for wide rnge of input vlues. A specil construction for repetitive strings We end this exposition y discussing specil cse of oth prcticl relevnce nd theoreticl interest. DNA, nd in prticulr humn DNA, lrgely consists of repetitive sequences. Estimtes of repetitive contents go up to 50% nd more [30] (p. 860). In the context of iosequence nlysis, it is unfortunte tht the wotd-lgorithm performs most poorly on repetitive strings. The worst cse running time of O(n 2 ) is chieved on strings of form n. The wotd-lgorithm performs similrly on Fioncci strings (see e.g. [31], exercise ), which re known for their highly repetitive structure. This is prticulrly unfortunte since Fioncci strings, with their itertive pttern tht grows repetitive strings vi recomintion of previously generted strings, my e seen s n idelized model of the effects of DNA cutting, repir, nd recomintion. Let us nlyze why the wotd-lgorithm performs poorly on repetitive strings. Long repets often led to suffix trees with long interior edges, nd significnt effort is spent on determining the longest common prefix of the suffixes contriuting to sutree elow such n edge. To e more precise, the lels of edges leving the nodes w nd w for nd w re often identicl, or the former is prefix of the ltter. This is ecuse the suffixes in R(w) re (prt from the constnt prefix ) suset of the suffixes in R(w). This suggests the following improvement of the wotd-lgorithm: when evluting node w, we cn immeditely skip the first m chrcter comprisons if m is the length of the corresponding lcp computed for w, nd only then continue with the norml computtion of the lcp. In more dvnced lgorithm, one cn pply this ide even recursively, ut it is not the im of this exposition to elorte on such detils. A prototype implementtion of this ide hs een tested, nd in fct, it seems to outperform ll other suffix tree constructions on Fioncci strings. While still eing top-down construction, the lgorithm sketched here is no longer write-only, s it ccesses node w when evluting w. Hence mny of the rguments given in fvor of the wotd-lgorithm, in prticulr loclity of memory reference, must e re-evluted for this vrint. But there re further spects to this ide, interesting in their own right. The dependence of w on w induces prtil ordering on the construction of nodes. In the extreme, the suffix tree is constructed in order of incresing suffix length, nd we hve reverse online construction in perfect nlogy to Weiner s lndmrk lgorithm of 1973 [4]. This surprising oservtion my provide new insights into the reltionships etween different suffix tree constructions. ACKNOWLEDGEMENTS We thnk Gene Myers for providing copy of his suffix rry code. S. Kurtz ws supported y grnt Ku 1257/1-1 from the Deutsche Forschungsgemeinschft. Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

15 LAZY SUFFIX TREES 1049 REFERENCES 1. Apostolico A. The myrid virtues of suword trees. Comintoril Algorithms on Words. Springer: Berlin, 1985; Gusfield D. Algorithms on Strings, Trees, nd Sequences. Cmridge University Press: Cmridge, Skien SS. Who is interested in lgorithms nd why? Lessons from the Stony Brook Algorithms Repository. Proceedings 2nd Workshop on Algorithm Engineering. IEEE Press: New York, 1998; we98/proceedings/. 4. Weiner P. Liner pttern mtching lgorithms. Proceedings 14th Annul IEEE Symposium on Switching nd Automt Theory, 1973; McCreight EM. A spce-economicl suffix tree construction lgorithm. Journl of the ACM 1976; 23(2): Ukkonen E. On-line construction of suffix-trees. Algorithmic 1995; 14(3): Giegerich R, Kurtz S. From Ukkonen to McCreight nd Weiner: unifying view of liner-time suffix tree constructions. Algorithmic 1997; 19(3): Giegerich R, Kurtz S. A comprison of impertive nd purely functionl suffix tree constructions. Science of Computer Progrmming 1995; 25(2 3): Horspool RN. Prcticl fst serching in strings. Softwre Prctice nd Experience 1980; 10(6): Kurtz S. Reducing the spce requirement of suffix trees. Softwre Prctice nd Experience 1999; 29(13): Kurtz S, Choudhuri JV, Ohleusch E, Schleiermcher C, Stoye J, Giegerich R. REPuter: the mnifold pplictions of repet nlysis on genomic scle. Nucleic Acids Reserch 2001; 29(22): Mner U, Myers EW. Suffix rrys: A new method for on-line string serches. SIAM Journl of Computing 1993; 22(5): Frch M. Optiml suffix tree construction with lrge lphets. Proceedings 38th Annul Symposium on the Foundtions of Computer Science. IEEE Press: New York, 1997; Dorohoncenu B, Nevill-Mnning CG. Protein clssifiction using suffix trees. Proceedings of the Eighth Interntionl Conference on Intelligent Systems for Moleculr Biology. AAAI Press: Menlo Prk, CA, 2000; Hunt E, Atkinson MP, Irving RW. A dtse index to lrge iologicl sequences. Proceedings of the 27th Interntionl Conference on Very Lrge Dtses. Morgn Kufmnn: Sn Frncisco, 2001; Clrk DR, Munro JI. Efficient suffix trees on secondry storge. Proceedings of the Seventh Annul ACM SIAM Symposium on Discrete Algorithms. ACM Press: New York, 1996; Munro JI, Rmn V, Ro SS. Spce efficient suffix trees. Journl of Algorithms 2001; 39(2): Grossi R, Vitter JS. Compressed suffix rrys nd suffix trees with pplictions to text indexing nd string mtching. Proceedings of the 32th Annul ACM Symposium on the Theory of Computing. ACM Press: New York, 2000; Andersson A, Nilsson S. Efficient implementtion of suffix trees. Softwre Prctice nd Experience 1995; 25(2): Irving RW, Love L. Suffix inry serch trees nd suffix rrys. Reserch Report TR , Deprtment of Computer Science, University of Glsgow, Kärkkäinen J. Suffix cctus: A cross etween suffix tree nd suffix rry. Proceedings of the 6th Annul Symposium on Comintoril Pttern Mtching (Lecture Notes in Computer Science, vol. 937). Springer: Berlin, 1995; Alluzen C, Crochemore M, Rffinot M. Fctor orcle: new structure for pttern mtching. Proceedings of the 26th Annul Conference on Current Trends in Theory nd Prctice of Informtics (Lecture Notes in Computer Science, vol. 1725). Springer: Berlin, 1999; Colussi L, De Col A. A time nd spce efficient dt structure for string serching on lrge texts. Informtion Processing Letters 1996; 58(5): Giegerich R, Kurtz S, Stoye J. Efficient implementtion of lzy suffix trees. Proceedings of the Third Workshop on Algorithmic Engineering (Lecture Notes in Computer Science, vol. 1668). Springer: Berlin, 1999; Mrtinez HM. An efficient method for finding repets in moleculr sequences. Nucleic Acids Reserch 1983; 11(13): Gusfield D. An increment-y-one pproch to suffix rrys nd trees. Report CSE-90-39, Computer Science Division, University of Cliforni, Dvis, Apostolico A, Iliopoulos C, Lndu GM, Schieer B, Vishkin U. Prllel construction of suffix tree with pplictions. Algorithmic 1988; 3: Hopcroft J. An O(nlog n) lgorithm for minimizing sttes in finite utomton. The Theory of Mchines nd Computtions, Kohvi Z, Pz A (eds.). Acdemic Press: New York, 1971; Cormen TH, Leiserson CE, Rivest RL. Introduction to Algorithms (2nd edn). MIT Press: Cmridge, MA, Interntionl Humn Genome Sequencing Consortium. Initil sequencing nd nlysis of the humn genome. Nture 2001; 409: Knuth DE. The Art of Computer Progrmming. Fundmentl Algorithms (3rd edn), vol. 1. Addison-Wesley: Reding, MA, Copyright c 2003 John Wiley & Sons, Ltd. Softw. Prct. Exper. 2003; 33:

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

Lecture 10: Suffix Trees

Lecture 10: Suffix Trees Computtionl Genomics Prof. Ron Shmir, Prof. Him Wolfson, Dr. Irit Gt-Viks School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' רון שמיר, פרופ' חיים וולפסון, דר' עירית גת-ויקס ביה"ס למדעי

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

CSE 549: Suffix Tries & Suffix Trees. All slides in this lecture not marked with * of Ben Langmead.

CSE 549: Suffix Tries & Suffix Trees. All slides in this lecture not marked with * of Ben Langmead. CSE 549: Suffix Tries & Suffix Trees All slides in this lecture not mrked with * of Ben Lngmed. KMP is gret, ut T = m P = n (note: m,n re opposite from previous lecture) Without preprocessing (KMP) Given

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

Suffix trees. December Computational Genomics

Suffix trees. December Computational Genomics Computtionl Genomics Prof Irit Gt-Viks, Prof. Ron Shmir, Prof. Roded Shrn School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' עירית גת-ויקס, פרופ' רון שמיר, פרופ' רודד שרן ביה"ס למדעי

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

On String Matching in Chunked Texts

On String Matching in Chunked Texts On String Mtching in Chunked Texts Hnnu Peltol nd Jorm Trhio {hpeltol, trhio}@cs.hut.fi Deprtment of Computer Science nd Engineering Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finlnd

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

I/O Efficient Dynamic Data Structures for Longest Prefix Queries

I/O Efficient Dynamic Data Structures for Longest Prefix Queries I/O Efficient Dynmic Dt Structures for Longest Prefix Queries Moshe Hershcovitch 1 nd Him Kpln 2 1 Fculty of Electricl Engineering, moshik1@gmil.com 2 School of Computer Science, himk@cs.tu.c.il, Tel Aviv

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

2-3 search trees red-black BSTs B-trees

2-3 search trees red-black BSTs B-trees 2-3 serch trees red-lck BTs B-trees 3 2-3 tree llow 1 or 2 keys per node. 2-node: one key, two children. 3-node: two keys, three children. ymmetric order. Inorder trversl yields keys in scending order.

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

Video-rate Image Segmentation by means of Region Splitting and Merging

Video-rate Image Segmentation by means of Region Splitting and Merging Video-rte Imge Segmenttion y mens of Region Splitting nd Merging Knur Anej, Florence Lguzet, Lionel Lcssgne, Alin Merigot Institute for Fundmentl Electronics, University of Pris South Orsy, Frnce knur.nej@gmil.com,

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

Lexical analysis, scanners. Construction of a scanner

Lexical analysis, scanners. Construction of a scanner Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Suffix Arrays on Words

Suffix Arrays on Words Suffix Arrys on Words Polo Ferrgin nd Johnnes Fischer Diprtimento di Informtic, University of Pis ferrgin@di.unipi.it Institut für Informtik, Ludwig-Mximilins-Universität München Johnnes.Fischer@io.ifi.lmu.de

More information

Lily Yen and Mogens Hansen

Lily Yen and Mogens Hansen SKOLID / SKOLID No. 8 Lily Yen nd Mogens Hnsen Skolid hs joined Mthemticl Myhem which is eing reformtted s stnd-lone mthemtics journl for high school students. Solutions to prolems tht ppered in the lst

More information

Qubit allocation for quantum circuit compilers

Qubit allocation for quantum circuit compilers Quit lloction for quntum circuit compilers Nov. 10, 2017 JIQ 2017 Mrcos Yukio Sirichi Sylvin Collnge Vinícius Fernndes dos Sntos Fernndo Mgno Quintão Pereir Compilers for quntum computing The first genertion

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015 Finite Automt Lecture 4 Sections 3.6-3.7 Ro T. Koether Hmpden-Sydney College Wed, Jn 21, 2015 Ro T. Koether (Hmpden-Sydney College) Finite Automt Wed, Jn 21, 2015 1 / 23 1 Nondeterministic Finite Automt

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

Lecture 7: Integration Techniques

Lecture 7: Integration Techniques Lecture 7: Integrtion Techniques Antiderivtives nd Indefinite Integrls. In differentil clculus, we were interested in the derivtive of given rel-vlued function, whether it ws lgeric, eponentil or logrithmic.

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997.

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997. Forced convex n-gons in the plne F. R. K. Chung y University ofpennsylvni Phildelphi, Pennsylvni 19104 R. L. Grhm AT&T Ls - Reserch Murry Hill, New Jersey 07974 Mrch 2,1997 Astrct In seminl pper from 1935,

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Approximation of Two-Dimensional Rectangle Packing

Approximation of Two-Dimensional Rectangle Packing pproximtion of Two-imensionl Rectngle Pcking Pinhong hen, Yn hen, Mudit Goel, Freddy Mng S70 Project Report, Spring 1999. My 18, 1999 1 Introduction 1-d in pcking nd -d in pcking re clssic NP-complete

More information

PLWAP Sequential Mining: Open Source Code

PLWAP Sequential Mining: Open Source Code PL Sequentil Mining: Open Source Code C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontrio N9B 3P4 cezeife@uwindsor.c Yi Lu Deprtment of Computer Science Wyne Stte University Detroit,

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

4/29/18 FIBONACCI NUMBERS GOLDEN RATIO, RECURRENCES. Fibonacci function. Fibonacci (Leonardo Pisano) ? Statue in Pisa Italy

4/29/18 FIBONACCI NUMBERS GOLDEN RATIO, RECURRENCES. Fibonacci function. Fibonacci (Leonardo Pisano) ? Statue in Pisa Italy /9/8 Fioncci (Leonrdo Pisno) -? Sttue in Pis Itly FIBONACCI NUERS GOLDEN RATIO, RECURRENCES Lecture CS Spring 8 Fioncci function fi() fi() fi(n) fi(n-) + fi(n-) for n,,,,,, 8,,, In his ook in titled Lier

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation Union-Find Problem Given set {,,, n} of n elements. Initilly ech element is in different set. ƒ {}, {},, {n} An intermixed sequence of union nd find opertions is performed. A union opertion combines two

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Stack. A list whose end points are pointed by top and bottom

Stack. A list whose end points are pointed by top and bottom 4. Stck Stck A list whose end points re pointed by top nd bottom Insertion nd deletion tke plce t the top (cf: Wht is the difference between Stck nd Arry?) Bottom is constnt, but top grows nd shrinks!

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC0 Introduction to Dt Structures Lecture 7 Queues, Priority Queues Queues I A queue is First-In, First-Out = FIFO uffer e.g. line-ups People enter from the ck of the line People re served (exit) from

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information