Asurveyofpractical algorithms for suffix tree construction in external memory

Size: px
Start display at page:

Download "Asurveyofpractical algorithms for suffix tree construction in external memory"

Transcription

1 Asurveyofprtil lgorithms for suffix tree onstrution in externl memory M. Brsky,, U. Stege n A. Thomo University of Vitori, PO Box, STN CSC Vitori, BC, VW P, Cn SUMMAY The onstrution of suffix trees in seonry storge ws onsiere imprtil ue to its exessive I/O ost. Algorithms evelope in the lst ee show tht suffix tree n effiiently e uilt in seonry storge for inputs whih fit the min memory. In this pper, we nlyze the etils of lgorithmi pprohes to the externl memory suffix tree onstrution n ompre the performne n slility of existing stte-of-thert softwre se on these lgorithms. key wors: suffix tree; externl memory lgorithms; string inex. Introution Suffix trees [] re igitl trees whih inex ll the istint non-empty sustrings of given set of strings. An erly, impliit form of suffix trees n e foun in Morrison s [] Ptrii tree. But it ws Weiner [] who initilly propose to use suffix tree s n expliit inex. One the suffix tree for set of strings is uilt, we n solve multiple omintoril prolems on strings in optiml time, tht is in time liner in the length of the input. Fining ommon ptterns, eh pttern eing sustring of every string in the input set, is one exmple of suh prolem []. Counting the totl numer of ifferent sustrings with the sme liner time omplexity is nother exmple []. Suffix trees n e use to fin ll the lotions of pttern in set of strings, to ompute mthing sttistis, to lote ll repetitive sustrings, or to extrt plinromes []. Corresponene to: Mrin Brsky, PO Box, STN CSC Vitori, BC, Cn, VW P E-mil: mgrsky@s.uvi. PATICIA stns for Prtil Algorithm To etrieve Informtion Coe In Alphnumeri

2 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY Suh mrvelous filities o not ome without prie: the suffix tree oupies t lest times more spe thn the input it is uilt upon. For exmple, when we uil the suffix tree for n input of size GB, we will require t lest GB of spe. As of Ferury, the totl size of the pulily ville GenBnk sequene tses hs rehe Gp, n the size of t in the Whole Genome Shotgun (WGS) sequening projet stns t out Gp []. Notly, the size of GenBnk is ouling pproximtely every months []. If we im to uil the suffix tree for the entire tse of pulily ville sequene DNA, the spe require for suh tree (not less thn GB) is too ig for the min memory of the moern omputer. To onstrut suh tree, we n use lrger n heper isk spe inste. In orer to use this lrger isk spe we nee to esign n externl memory (EM) lgorithm for the onstrution of the suffix tree. EM lgorithms iffer from the lgorithms for min memory. The ess to t on isk is - times slower thn the ess to t in min memory []. In orer to ompenste for these spee ifferenes in the esign of EM lgorithms, the externl memory omputtionl moel, or isk ess moel (DAM), ws propose []. DAM represents the omputer memory in form of two lyers with ifferent ess hrteristis: the fst min memory of limite size M, n slow n ritrrily lrge seonry storge memory (isk). In ition, for isks, it tkes out s long to feth onseutive lok of t s it oes to feth single yte. Tht is why in the DAM omputtionl moel the symptoti performne is evlute s the totl numer of lok trnsfers etween isk n min memory. Although the DAM omputtionl moel is workle pproximtion, it oes not lwys urtely preit the performne of EM lgorithms. This is euse it oes not tke into ount the following importnt isk ess property. The ost of rnom isk ess is the sum of seek time, rottionl ely n trnsfer time. The first two ominte this ost in the verge se, n s suh, re the ottlenek of rnom isk ess. However, if the isk he is positione extly over the piee of t we wnt, then there is no seek time n rottionl ely omponent, ut only trnsfer time. Hene if we ess t sequentilly in isk, then we only py seek time n rottionl ely for loting the first lok of the t, ut not for the susequent loks. The ifferene in ost etween sequentil n rnom ess eomes even more prominent if we lso onsier re-he-uffering optimiztions whih re ommon in urrent isks n operting systems []. Thus, the numer of rnom isk esses is n importnt mesure to preit the effiieny of EM lgorithms. Before esriing the strtegies of EM lgorithms for the suffix tree onstrution, let us tke loser look t the suffix tree t struture n its omputer representtion... The suffix tree t struture First, we equip ourselves with some useful efinitions. Note tht we fous in our isussion on the tritionl rotting isks. We elieve tht reserh on the use of SSD isks, whih hve ifferent ess ehvior, is ertinly promising future iretion, ut to the est of our knowlege, SSDs hve not yet een explore s memory extension for the suffix tree onstrution. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

3 M. BASKY,U. STEGE AND A. THOMO We onsier string X = x x...x N to e sequene of N symols over n lphet Σ. We tth to the en of X one more symol, $, whih is unique n not in Σ ( so-lle sentinel). By S i = X[i, N] weenotesuffix of X eginning t position i, i N. ThusS = X n S N = $. Note tht we n uniquely ientify eh suffix y its strting position. Prefix P i is sustring [,i]ofx. Thelongest ommon prefix LCP ij of two suffixes S i n S j is sustring X[i, i+k] suh tht X[i, i+k] =X[j, j +k], n X[i, i+k+] X[j, j +k+]. For exmple, if X =, thenlcp, =, n LCP, =. If we sort ll the suffixes of string X in lexiogrphil orer n reor this orer into n rry SA of integers, then we otin the suffix rry of X. SA hols ll integers i in the rnge [,N], where i represents S i. In more prtil terms, the rry SA is n rry of positions sorte oring to the lexiogrphi orer of the suffixes. Note tht the suffixes themselves re not store in this rry ut re rther represente y their strt positions. For exmple, for X = $ SA =[,,,,, ]. The suffix rry n e ugmente with the informtion out the longest ommon prefixes for eh pir of suffixes represente s onseutive numers in SA. A trie is type of igitl serh tree []. In trie, eh ege represents hrter from the lphet Σ. The mximum numer of hilren for eh trie noe is Σ, n siling eges must represent istint symols. A suffix trie is trie for ll the suffixes of X. Asnexmple, thesuffixtrieforx = is shown in Figure [Left]. Beginning t the root noe, eh of the suffixes of X n e foun in the trie: strting with,,, n finishing with. Beuse of this orgniztion, the ourrene of ny query sustring of X n e foun y strting t the root n following mthes own the trie eges until the query is exhuste. In the worst se, the totl numer of noes in the trie is qurti in N. This sitution rises, for exmple, if ll the pths in the trie re isjoint, s for the input string e. The numer of eges in the suffix trie n e reue y ollpsing pths ontining unry noes into single ege. This proess yiels the struture lle suffix tree. Figure [ight] shows wht the suffix trie for X looks like when onverte to suffix tree. The tree still hs the sme generl shpe, just fr fewer noes. The leves re lele with the strt position in X of orresponing suffixes, n eh suffix n e foun in the tree y ontenting sustrings ssoite with ege lels. In prtie, these sustrings re not store expliitly, ut they re represente s n orere pir of integers inexing its strt n en position in X. The totl numer of noes in the suffix tree is onstrine ue to two fts: () there re extly N leves n () the egree of ny externl noe is t lest. There re therefore t most N internl noes in the tree. Hene, the mximum numer of noes (n eges) is liner in N. The tree s totl spe is liner in N in the se tht eh ege lel n e store in onstnt spe. Fortuntely, this is the se for n impliit representtion of sustrings y their positions. More formlly, suffix tree is igitl tree of symols for the suffixes of X, where eges re lele with the strt n en positions in X of the sustrings they represent. Note lso tht eh internl noe in the suffix tree represents n en of the longest ommon prefix for some pir of suffixes. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

4 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY Figure. [Left] The suffix trie for X =. Sine ours only t the en of X, itnserve s unique sentinel symol. Note tht eh suffix of X n e foun in the trie y ontenting hrter lels on the pth from the root to the orresponing lef noe. [ight] The suffix tree for X =. For lrity, the expliit ege lels re shown, whih re represente s orere pirs of positions in the tul suffix tree. Eh suffix S i n e foun y ontenting sustrings of X on the pth from the root to the lef noe L i. Figure. An rry representtion of the suffix tree for X =. Eh noe ontins n rry of hil pointers. Note tht not ll the ells of this rry re in use. The sequenes in the noes re the lels of the inoming eges. They re shown for lrity only n re not store expliitly... Suffix tree storge optimiztions We isuss next the prolem of suffix tree representtion in memory in orer to estimte the isk spe requirements for the suffix tree. It is ommon to represent the noe of suffix tree together with the informtion out n inoming ege lel. Eh noe, therefore, ontins two integers representing the strt n en positions of the orresponing sustring of X. In ft, it is enough to store only the strt position of this sustring s the length of it n e eue from the strt position of the hil noe or is simply N if urrent noe is lef. In strightforwr implementtion, eh noe Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

5 M. BASKY,U. STEGE AND A. THOMO B B Strting hrter First hil L L L L L Strt pos * * * Figure. [A]. Left-hil right-siling representtion of the suffix tree for X =. Eh noe ontins pointer to its first hil n pointer to the next siling. [B]. Giegerih et l. s representtion of the suffix tree, where ll silings re represente s onseutive elements in the rry of noes. The speil symol inites the it representing the lst siling. Eh noe ontins only pointer to the first hil n the strt position of the inoming ege-lel. hs pointers to ll its hil noes. These hil pointers n e represente s n rry, s linke list or s hsh tle []. If the size of Σ is smll, the hil noe pointers n e represente in form of n rry of size Σ. Ehi th entry in this rry represents the hil noe whose inoming lel strts with the i th hrter in rnke lphet. This is very useful for tree trversls, sine the orresponing hil n e lote in onstnt time. Let us first onsier the tree spe for the inputs where N is less thn the lrgest yte integer, i.e. log N<. In this se, eh noe struture onsists of Σ integers for hil noe pointers plus one integer to represent the strt position of the ege-lel sustring. Sine there re t most N noes in the tree, the totl spe require is N( Σ + ) integers, whih, for exmple, for Σ = (DNA lphet) yiels N ytes of storge per N ytes of input. Suh representtion is epite in Figure. For lrger lphets, n rry representtion of hilren is imprtil n n e reple y linke list representtion []. However, this requires n itionl log Σ serh time spent t eh internl noe uring the tree trversl, in orer to lote orresponing hil. In ition, sine the position of hil in list oes not reflet the first symol of its inoming ege lel, we my nee to store n itionl yte representing this first hrter. Another possiility is to represent hil pointers s hsh tle []. This preserves onstnt-time ess to eh hil noe n is more spe-effiient thn the rry representtion. The linke-list se representtion known s left-hil right-siling ws propose y MCreight in []. In this implementtion, the suffix tree is represente s set of noe strutures, eh onsisting of the strt position of the sustring leling the inoming ege, together with two pointers one pointing to the noe s first hil n the other one to its next siling. ell tht the en position of the ege-lel sustring is not store expliitly, sine for n internl noe it n e eue from the strt position of its first hil, n for lef noe Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

6 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY this en position is simply N. This representtion of the noe s hilren is of type linke list, with ll its spe vntges n serh rwks. The MCreight suffix tree representtion is illustrte in Figure [A]. Eh suffix tree noe onsists of integers, n sine there re up to N noes in the tree, the size of suh tree is t most N. Agin, for etter trversl effiieny, we my store the first symol long eh ege lel. Then the totl size of suffix tree will e t most N ytes for N ytes of input. An even more spe effiient storge sheme ws propose y Giegerih et l. []. In this optimiztion, the pointers to siling noes re not store, ut the siling noes re ple onseutively in memory. The lst siling is mrke y speil it. Now, eh noe stores only the strt position of orresponing ege-lel plus the pointer to its leftmost hil. As efore, for effiieny of the trversl, eh noe my store n itionl yte representing the strt symol of its ege lel. The size of suh tree noe is ytes. For mximum of N noes this yiels mximum of N ytes of storge. Giegerih et l. s [] representtionis epite in Figure [B]. An itionl possiility to optimize the storge of the suffix tree is to onsier eh suffix s sequene of its. The prolem of renming, whih is generi reution from strings over n unoune lphet to inry strings, ws stuie in []. It ws shown tht suh reution n e one in liner time. Note tht string over ny lphet Σ n lwys e reue to the inry lphet y representing eh hrter s sequene of = log Σ its n then ontenting these inry sequenes. For inry lphet, ny internl noe in the suffix tree hs extly two hilren. This is euse suh noe nnot hve more thn two hilren, ut lso nnot hve less thn two for it to e suffix tree internl noe. This llows using two hil pointers only (per noe) n representing the entire suffix tree s n rry of the onstnt-size noes. If the entire input string is onsiere s sequene of its, only the vli suffixes re e to the tree. These re the suffixes strting t positions i suh tht i mo =,where is the numer of its use to represent eh hrter of Σ. As suh, we hve the sme numer of tree noes s efore: the tree hs one lef noe n one internl noe per inserte suffix. Figure shows the equivlent suffix trees over the originl n the inry lphets for input string X =. Ehnoe hs extly two hil pointers plus one integer representing strt position of inoming egelel. Sine there re extly N noes in suh tree, the totl size is N ytes. Note tht this is inepenent of the size of the lphet. This inry representtion of the suffix tree supports mny ommon string queries. For exmple, in orer to fin ourrenes of pttern in string X we n tret the pttern s sequene of its, n mth these its long the pth strting t the root. Also, if we re looking for the longest repeting sustring (LS) of X, n the lphet ontins hrters represente y its eh, we fin the internl noe of the gretest epth, sy, fromthe root. Then we lulte the LS (with respet to the originl lphet) s LS = /. Note tht in ll representtions the lef noes o not ontin hil pointers, thus t the en of the onstrution we n output the lef noes in seprte rry. Eh element in the rry of lef noes stores only the strt position of the orresponing sustring sine the en position is implie to e N. In this se, the rry representtion oupies N ytes (for Σ = ), the MCreight suffix tree oupies N ytes, Giegerih et l. s representtion oupies N ytes n the suffix tree for the inry lphet oupies N ytes. These representtions re in Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

7 M. BASKY,U. STEGE AND A. THOMO Figure. [Left] Suffix tree for X = given for omprison. [ight] Suffixtreeforthesme input string where eh suffix is onverte to sequene of its. Eh hrter is enoe using its. generl not well suite for the use uring the proess of tree onstrution when we upte the tree noes, ut they n e use when outputting the omplete tree to isk. This short survey of storge requirements lerly emonstrtes the ft tht the suffix tree is very spe-emning, even if we re using n unlimite spe of isks. For exmple, for n input of GB, the tree oupies t lest GB of isk spe. Further, for inputs exeeing in size the lrgest -yte integer, the strt positions n the hil pointers nee more thn ytes for their representtion, nmely log N its for eh numer. In prtie, for the inputs of size in the tens of gigytes the tree n esily reh N ytes. This is importnt to rememer while esigning lgorithms for effiient trversls of suh lrge trees (see setion.). Until, the t struture y Giegerih et l. [] ws known s the most spe effiient representtion. Then Skne [] fully evelope the ompresse suffix tree n its lne prenthesis representtion. More out ompresse suffix trees n e foun in reent ppers [, ]. The ompresse representtion llows to store the entire suffix tree in only N its. An exmple of the prenthesis representtion of the suffix tree noes for string X = is shown in Figure. The prentheses esrie the tree topology. In orer to store the informtion out the strt position n the epth of eh tree noe, speil rry n its unry enoing re use to ring the totl memory requirements for the tree to N its []. The ompresse suffix tree supports ll regulr suffix tree queries with poly-log slowown []. The lgorithm for the ompresse suffix tree onstrution ws implemente (see []) n is ville for inexing genomi sequenes []. The reserh on ompresse suffix trees ims to ompress the input string n the output tree into smller self-inexing struture whih n fit into min The exeption is the Top Down Disk Bse suffix tree onstrution (TDD) [], where the noes re rete from the top n t eh step it is known how mny hilren eh noe ontins t the en of the omputtion. Thus, lef noes oupy only four ytes uring the onstrution itself. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

8 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY Figure. The prenthesis tree representtion is min high-level ie for the suffix tree ompression []. memory. Hene, we o not onsier the onstrution of ompresse suffix trees s prt of this survey. This rief outline of the ompresse suffix tree representtion is given here only to show tht the entire fully-funtionl suffix tree n e store using muh less spe thn previously elieve. We unerline tht the following isussion is out the lssil (non-ompresse) suffix trees whih re uilt using seonry storge. The reminer of the pper is orgnize s follow. We introue in Setion reent prtil methos for EM suffix tree onstrution n evlute their performne n slility using the numer of sequentil psses over isk t, the numer of rnom isk I/Os n the in-memory running time. Then, in Setion we point out the still unsolve hllenges in onstrution of suffix trees in seonry storge. Finlly, in Setion we outline theoretil results whih my serve s sis of further prtil reserh.. Prtil methos for the suffix tree onstrution in externl memory In this setion, we present the min ies whih gve irth to the stte-of-the-rt softwre for suffix tree onstrution in seonry storge. These methos re steps towr ompletely slle suffix tree onstrution for inputs of ny kin n size. When this prolem is solve, wie rnge of queries on mssive string t will e possile to exeute in optiml time. The suffix tree for the input string X of length N n e uilt in time O(N). Linertime lgorithms were evelope in [,, ]. In [] it ws shown tht ll three of them re se on similr lgorithmi ies. It might e tempting to use these symptotilly optiml lgorithms for n externl memory implementtion. However, looking losely t these lgorithms, we oserve tht they ssume tht rnom ess to the input string n to the tree tkes onstnt time. Unfortuntely, in prtie, when some of these t strutures outgrow the min memory n re esse iretly on isk, the ess time to isk-se rrys vries signifintly epening on the reltive lotion of the t on isk. The totl numer of rnom isk esses for these liner-time lgorithms is, in ft, O(N). This is extremely ineffiient Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

9 M. BASKY,U. STEGE AND A. THOMO ST ST ST Figure. The three first steps of the Ukkonen lgorithm. An rrow inites the tive point t the en of eh itertion. Note tht the extension of the eges ening t lef noes with the next hrter is performe impliitly: the ege length is just extene y. n uses the so-lle isk thrshing prolem, whih let the uthors in [] onlue tht the suffix tree in seonry storge is invile. The rnom ess ehvior of these lgorithms in externl memory settings n e improve, s ws shown in [] for Ukkonen s lgorithm []. We esrie next the Ukkonen lgorithm n show how it ws extene for externl memory... The Ukkonen lgorithm n its on-isk version For given string X, Ukkonen s lgorithm strts with the empty tree (tht is, tree onsisting just of root noe) n then progressively uils n intermeite suffix tree ST i for eh prefix X[,i], i<n. In orer to onvert suffix tree ST i into ST i,ehsuffixofst i is extene with the next hrter x i. We o this y visiting eh suffix in orer, strting with the longest suffix n ening with the shortest one (empty string). The suffixes inserte into ST i my en in three types of noes: lef noes, internl noes or in the mile of n ege (t so-lle impliit internl noe). Note tht if suffix of ST i ens in lef noe, we o not nee to exten it with the next hrter. Inste, we onsier eh lef noe s n open noe: t eh step of the lgorithm every lef noe runs till the en of the urrent prefix, mening the en position on eh lef noe will eventully eome N. Consier the exmple in Figure. It shows the three first itertions of the suffix tree onstrution for X =. In the seon itertion, we impliitly exten the -hil of root noe with, nwe new ege for from the root (extening n empty suffix). Thus, in eh itertion, we nee to upte only suffixes of ST i whih en t expliit or impliit internl noes of ST i. We fin the en of the longest mong suh suffixes t the tive point. The tive point is the (expliit or impliit) internl noe where the previous itertion ene. If the noe t the tive point lrey hs hil strting with x i,thetive point vnes one position own the orresponing ege. This mens tht ll the suffixes of ST i lrey exist in ST i s the prefixes of some other suffixes. In se tht there is no Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

10 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY outgoing ege strting with the new hrter, we new lef noe s hil of our expliit or impliit internl noe (tive point). Here n impliit internl noe eomes expliit. In orer to move to the extension of the next suffix, whih is shorter y one hrter, we follow the hin of suffix links. A suffix link is irete ege from eh internl noe of the suffix tree (soure) to some other internl noe whose inoming pth is one (the first) hrter shorter thn the inoming pth of the soure noe. The suffix links re e when the sequene of internl noes is rete uring ege splits. To illustrte, onsier the lst itertion of the Ukkonen lgorithm extening n intermeite tree for for X = with the lst hrter. Weextenllthesuffixes of ST (Figure [A]) with this lst hrter. The tive point is originlly two hrters elow the noe lele y in Figure [A], n the impliit internl noe is inite y lk tringle. The tive point is onverte to n expliit internl noe with two hilren: one of them is the existing lef with inoming ege lel n the other one is new lef for suffix S (Figure [B]). Then, we follow the suffix link from the -noe to the -noe, n we new lef y splitting n impliit noe two hrters elow the -noe. This results in the tree of Figure [C] with lef for suffix S. Next, the suffix link from the -noe les us to the root noe, n two hrters long the orresponing ege we fin the -noe n to it new ege strting with n leing to lef noe for suffix S (Figure [D]). We ontinue in similr mnner n the orresponing hil strting with oth to the -noe (Figure [E]) n to the root (Figure [F]). This illustrtes how suffix links help to fin ll the insertion points for the new lef noes. There is onstnt numer of steps per lef retion, therefore the totl mortize running time of the Ukkonen lgorithm is O(N). The pseuooe in Figure shows the proeure upte for onverting ST i into ST i []. Eh ll of next smller suffix() fins the next suffix y following suffix link. If we look t Figure s pseuooe from the isk ess point of view, we see tht loting the next suffix requires rnom tree trversl, one per lef rete. Hene, when the tree ST i is to e store on isk, noe ess requires n entire rnom isk I/O. This ess time epens on the isk ple of the next ess point. Moreover, sine the eges of the tree re not lele with tul hrters, it is importnt tht we ess rnomly the input string in orer to ompre the test hr with the hrters of X enoe s positions in the suffix tree eges. Unfortuntely, this les to very imprtil performne, sine the lgorithm spens ll its time moving the isk he from one rnom isk lotion to nother. In [], Bethur n Hrits stuie the ptterns of noe esses uring the suffix tree onstrution se on Ukkonen s lgorithm. They foun tht the higher tree noes re esse muh more frequently thn the eeper ones. This gve rise to the uffer mngement metho known s TOP-Q. In this on-isk version of Ukkonen s lgorithm, the noes whih re esse often, hve priority of stying in the memory uffer, n the other noes re eventully re from isk. This signifintly improves the hit rte for esse noes when ompre to rther strightforwr implementtions. However, in prtil terms, in orer to uil the suffix tree for the sequene of the Humn hromosome I (pproximtely MB), the TOP-Q runs for Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

11 M. BASKY,U. STEGE AND A. THOMO A B * ** * ** C D * ** * ** E F * ** * ** Figure. The lst steps of the Ukkonen lgorithm pplie to X =. In this se of lef itions the ST is upte to ST. The ple for the next insertion is foun following the suffix links (otte rrows). hours, s ws reently evlute using moern mhine [], n n not e onsiere prtil metho for inexing lrge inputs. Next we esrie rute-fore pproh for the suffix tree onstrution, whih runs in O(N ) time in the worst se. Amzingly enough, severl fst prtil methos for externl memory were evelope using this pproh, ue to the muh etter lolity of tree esses. We refer to the verge mhine urrently ville (Pentium with. GHz lok spee n GB of min memory) s the moern mhine. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

12 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY Ukkonen s lgorithm tive_point=root for i from to N Upte ( Prefix [,i] ) Upte (Prefix [,i] ) urr_suffix_en = tive_point test_hr = X [i] one = flse while not one if urr_suffix_en is lote t expliit noe if the noe hs no esennt strting with test_hr rete new lef else vne tive_point own the orresponing ege one = true else if the impliit noe's next hr is not equl test_hr rete expliit noe rete new lef else vne tive_point own the orresponing ege one = true if urr_suffix_en is lote t root noe tive_point=root one = true else urr_suffix_en = next_smller_suffix() //follow the suffix link tive_point = urr_suffix_en Figure. Pseuooe of Ukkonen s lgorithm for the suffix-tree onstrution... The rute-fore pproh n the Hunt lgorithm An intuitive metho of onstruting the suffix tree ST is the following: for given string X we strt with tree onsisting of only root noe. We then suessively pths orresponing to eh suffix of X from the longest to the shortest. This results in the lgorithm epite in Figure [Top]. Here, ST i represents the suffix tree fter the insertion of ll suffixes S,...S i.the Upte opertion inserts pth orresponing to the next suffix S i yieling ST i.inorerto insert suffix S i into the tree we first lote some impliit or expliit noe orresponing to the longest ommon prefix of S i with some other suffix S j. To lote this noe, we perform LCP ij hrter omprisons. After this, if the pth for LCP ij ens in n impliit internl noe, it is trnsforme into n expliit internl noe. In ny se, we to this internl noe new lef orresponing to suffix S i. One the en of the LCP ij is foun, we new hil in onstnt time. Fining the en of LCP ij in the tree efines the overll time omplexity of Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

13 M. BASKY,U. STEGE AND A. THOMO Brute-fore lgorithm for i from to N Upte ( Suffix [i,n] ) Upte ( Suffix [i,n] ) fin LCP of Suffix [i,n] mthing hrters from the root if LCP ens in expliit noe hil lef lele y X[i+LCP+,N] else rete expliit noe t epth LCP from the root hil lef lele y X[i+LCP+,N] Hunt et l. s lgorithm for eh prefix P of length prefix_len for i from to N Upte ( Suffix [i,n], P ) write su-tree for prefix P to isk Upte ( Suffix [i,n], P ) if X[i,i+prefix_len] equls P fin LCP of Suffix [i,n] mthing hrters from the root of the su-tree if LCP ens in expliit noe hil lef lele y X[i+LCP+,N] else rete expliit noe t epth LCP from the root hil lef lele y X[i+LCP+,N] Figure. [Bottom]. The pseuooe of Hunt et l. s lgorithm [] for the suffix-tree onstrution se on the rute-fore lgorithm shown t the [Top]. the lgorithm. The en of LCP n e foun in one step in the est se ut in N steps in the worst se for eh of N inserte suffixes. This n, in the worst se, le to O(N ) totl hrter omprisons. However, Apostolio n Szpnkowski hve shown in [] tht on verge the rute-fore onstrution requires O(N log N) time. Their nlysis ws se on the ssumption tht the symols of X re inepenent n rnomly selete from n lphet oring to given proility istriution. Bse on this rute-fore pproh, the first prtil externl memory suffix tree onstrution lgorithm ws evelope in []. Hunt et l. s inrementl onstrution tres n iel O(N) performne for lolity of ess to the tree uring its onstrution. The output tree is in ft represente s forest of severl suffix trees. The suffixes in eh suh tree shre ommon prefix. Eh tree is uilt inepenently n requires snning of the entire input string for eh suh prefix. The ie is tht the suffixes tht hve prefix, sy, fll into ifferent sutree thn those strting with, n. Hene, one the tree Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

14 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY Figure. The steps of uiling the su-tree for prefix n input string X = with the lgorithm y Hunt et l.[] for ll suffixes strting with is uilt, it is never esse gin. The tree for eh prefix is onstrute inepenently in min memory, n then is written to isk. The numer of prtitions p is ompute s the rtio of the spe require for the tree of the entire input string, ST totl, to the size of the ville min memory M, i.e. p = ST totl /M. Then, the length of the prefix for eh prtition n e ompute s log Σ p,where Σ is the size of the lphet. This works well for non-skewe input t ut fils if for prtiulr prefix there is signifintly lrger mount of suffixes. This is often the se in DNA sequenes with lrge mount of repetitive sustrings. In orer to fit tree for eh possile prefix into min memory, we n inrese the length of the prefix. This, in turn, exponentilly inreses the totl numer of prtitions, n therefore, the totl numer of input string sns. The onstrution of the su-tree for prefix n input string X = is shown in Figure. Note tht the su-tree is signifintly smller thn the suffix tree for the entire input string. The pseuooe is given in Figure [Bottom]. We remrk tht we iterte through the input string s mny times s the totl numer of prtitions. The onstrution of tree for eh prtition is performe in min memory. At the en, the suffix tree for eh prtition is written to isk. Note lso tht in orer to perform the rute-fore insertion of eh suffix into the tree we nee to rnomly ess the input string X, whih therefore hs to resie in memory. Sine the input string is t lest n orer of mgnitue smller thn the tree, this metho effiiently resses the prolem of rnom esses to the tree in seonry storge, ut nnot e extene to inputs whih re lrger thn the min-memory instntition for holing X. The lgorithm performs muh fster thn the TOP-Q lgorithm, espite the ft tht its internl time is qurti in the length of the input string. This is euse for p prtitions ll p psses over the input string re performe in min memory, n the tree is trverse in min memory s well. Thus, the lgorithm performs only O(p) rnom esses: nmely, when writing the tree for eh prtition. For the Humn DNA of size up to MB (Humn hromosome I) input, the suffix tree with Hunt et l. s lgorithm n e onstrute in minutes [] ompretotop-q with hours for the sme input on the sme mhine. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

15 M. BASKY,U. STEGE AND A. THOMO e Figure. Differene etween the sprse suffix links [Left] n the tritionl suffix links [ight]. The performne of Hunt et l. s lgorithm egres rstilly if the input string oes not fit the min memory n shoul e kept on isk. In this se we hve O(pN) rnom esses, this time to the input string... Distriute n pge suffix trees A similr ie of proessing suffixes of X seprtely for eh prefix ws evelope in [, ]. The istriute n pge suffix tree (DPST) y Cliffor n Sergot [], whih ws propose first in ontext of istriute omputtion, hs ll the properties to e effiiently implemente to run using externl memory. As efore, the suffixes of X re groupe y their ommon prefix whose length epens on the size N of X n the mount of the ville min memory. The numer of suffixes in eh sutree is smll enough for the tree to e entirely uilt in min memory. Therefore, rnom isk ess to the su-tree uring its onstrution is voie. The min ifferene from Hunt et l. s lgorithm of the previous setion is tht the su-tree for eh prtiulr prefix is uilt in n symptoti time liner in N n not qurti. In orer to o so, the DPST lgorithm uses the ie evelope in [] to uil the suffix tree on wors. The min ies in [] re similr to the Ukkonen lgorithm [] esrie in Setion.. However, the Ukkonen lgorithm relies hevily on the ft tht ll suffixes of X re inserte, wheres the suffix tree on wors is uilt only for some suffixes of X, nmely the ones strting t positions mrke y elimiters. DPST pplies this ie onsiering the prtiulr prefix s the elimiter for prouing the su-tree for this prefix. It introues the ie of sprse suffix links (SSL) inste of regulr suffix links. A SSL in prtiulr sutree les from eh internl noe v i with inoming pth lel w to nother internl noe v j in the sme su-tree whose inoming pth-lel orrespons to the lrgest possile suffix of w foun in the sme su-tree (or to the root if the lrgest suh suffix is n empty string). We explin the ifferene etween the sprse suffix link n the regulr suffix link in the following exmple. Suppose we hve su-tree for prefix for X = e (see Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

16 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY e A B C D E e Figure. Steps of the onstrution of the su-tree for prefix y the istriute n pge suffix tree onstrution lgorithm of Cliffor n Sergot. Input string X = e. Figure ). In the regulr suffix tree, the suffix link from the internl noe with n inoming pth lel les to the noe with the inoming pth-lel. However, in the su-tree for prefix, there is no suffix strting with. So the longest suffix of whih n e foun in this su-tree is, n the SSL les to the internl noe with the inoming pth lel. Let us follow n exmple for the su-tree onstrution for X = e n prefix in Figure. This su-tree will ontin only the suffixes of X strting t positions,,,,,. Thus, we nee to insert only these six suffixes to the tree. First, we insert suffix S y reting lef L.Next,weS y fining tht X[] = X[] n X[] X[]. We split n ege n lef L. Now it is the turn for suffix S. Sine the first four hrters of S orrespon to some pth in the tree, ut X[] = oes not. Therefore, we lef L n rete n internl noe with inoming pth lel. We see tht the longest suffix of in this su-tree is. We rete sprse suffix link from internl noe for (mrke y in Figure ) to the one for (mrke y ). When we rete new lef out of the -noe for suffix S, we follow the SSL n rete the sme e-hil from the -noe (Figure ). The use of these sprse suffix links for ing new leves to the su-tree llows to perform the onstrution of eh su-tree in time liner in N. TheDPST runs in time O(NP)where Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

17 M. BASKY,U. STEGE AND A. THOMO e F ** * e e Figure. Smple output of the DPST lgorithm y Cliffor n Sergot. P is the totl numer of ifferent prefixes. Despite the superior symptoti internl running time w.r.t. the previous lgorithm, the prtil performne n the slility of the DPST s implemente in [] were inferior to the progrm y Hunt et l. [] for the rel DNA t use in the experiments... Top Down Disk se suffix tree onstrution (TDD) Qurti in the worst se, ut more elorte pproh of the Top Down Disk se suffix tree onstrution lgorithm (TDD)[] tkes the performne of the on-isk suffix tree onstrution to the next level. The se of the metho is the omintion of the woteger lgorithm of Giegerih et l. [] n Hunt et l. s prefix prtitioning esrie ove. Being still n O(N ) rute-fore pproh, TDD mnges more effiiently the memory uffers n is he-onsious metho whih performs very well for mny prtil inputs. The first step of TDD is the prtitioning of the input string in wy similr to tht of the lgorithm y Hunt et l.. Now, the tree for eh prtition is uilt s follows. The suffixes of eh prtition re first ollete into n rry where they re represente y their strt positions. Next, the suffixes re groupe y their first hrter into hrter groups. The numer of ifferent hrter groups gives the numer of hilren for the urrent tree noe. If for some hrter there is group onsisting only of one suffix, then this is lef noe n is immeitely written to the tree. If there is more thn one suffix in the group, the LCP of ll the suffixes is ompute y sequentil sns of X from ifferent rnom positions, n n internl noe t the orresponing epth is written to the tree. After vning the position of eh suffix y LCP, the sme proeure s efore is repete reursively. The pseuooe of the TDD lgorithm is given in Figure. To illustrte the lgorithm, let us oserve severl steps of the TDD suffix tree onstrution whih re epite in Figure. Suppose tht we hve prtitione ll the suffixes of X y Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

18 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY TDD lgorithm for eh prefix P of length prefix_len ollet suffixes strting with P into rry sort suffixes y the first hrter output groups with suffix s lef noes of the tree push groups with more thn suffix into the stk while stk is not empty pop suffixes of the sme group from the stk fin LCP of ll suffixes in the group y sequentil hrter omprisons output internl noe t the epth LCP vne position of eh suffix y LCP sort suffixes y the first hrter output groups with suffix s lef noes of the tree push groups with more thn suffix into the stk Figure. Pseuooe of the TDD lgorithm []. sort LCP= Figure. The steps of the TDD lgorithm [] for uiling the su-tree for prefix n input string X =. prefix of length. This gives four prtitions:,, n. WeshowhowTDD uils the suffix tree for prtition. The strt positions of suffixes strting with re {,,, }. Sine the prefix length is, the hrters t positions {,,, } re sorte lexiogrphilly. This proues three groups of suffixes: -group: {, }, -group: {} n -group: {}. Sine the -group n -group ontin one suffix eh, the suffixes in these groups proue lef noes n re immeitely e to the tree. The -group ontins two suffixes, n is therefore rnhing noe. LCP, =, n therefore the length of the hil strting with equls. At this epth, the internl noe rnhes t positions {, }, whih fter sorting result into two lef noes: the hilren strting with n respetively. The min istintive feture of the TDD onstrution is the orer in whih the tree noes re e to the output tree. Oserve tht the tree is written in top-own fshion, n the noes whih were expne in the urrent itertion re not esse nymore. This reues Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

19 M. BASKY,U. STEGE AND A. THOMO the numer of rnom esses to the prtilly uilt tree n the new noes n e written iretly to the isk. The numer of rnom isk esses is O(P ) s in Hunt et l. s lgorithm. However, the size of eh prtition my e muh igger thn efore sine now the min memory uffer for the suffix tree t struture oes not hve to hol n entire su-tree. This pttern of essing the tree ws shown to e very effiient for he rhitetures of the moern omputer. It ws even shown tht the TDD lgorithm outperforms the linertime lgorithm y Ukkonen for some inputs in se when ll the t strutures fit the min memory. For the sme input of millions of symols (Humn hromosome I) [], whih took out minutes with the suffix-y-suffix insertion of Hunt, TDD uils the tree in minutes []. As efore, the lgorithm performs mssive rnom esses to the input string when it oes the hrter-y-hrter omprisons strting t ifferent rnom positions. The input string for the TDD lgorithm nnot e lrger thn the min memory. Another prolem of TDD is the suffix tree on-isk lyout. The trees for ifferent prtitions re of ifferent sizes, n some of them n e signifintly igger thn the min memory. This poses some prolems when loing the sutree into min memory for querying. If the entire sutree nnot e loe into n trverse in the min memory, the epth first trversl of suh tree requires multiple rnom esses to ifferent levels of on-isk noes... The prtition-n-merge strtegy of Trellis The oversize sutrees use y t skew n e eliminte y using set of ifferent-length prefixes, s shown in []. In prtie, the initil prefix size is hosen so tht the totl numer of prefixes P will llow to proess eh of the P su-trees in min memory. For exmple, we nholinourminmemoryintotlt mx suffix tree noes. The ounts in eh group of suffixes shring the sme prefix re ompute y sequentil sn of input string X. If ount exees T mx, then we re-sn the input string from the eginning olleting ounters for n inrese prefix length. Bse on the finl ounts, none of whih exees T mx,thesuffixes re omine into pproximtely even-size groups. As n exmple onsier the se when suffixes strting with prefix our twie more often thn the suffixes strting with n. We n omine suffixes in prtitions n into single prtition with pproximtely the sme numer of suffixes s ontine in prtition. The mximum numer of suffixes in eh prefix prtition is hosen to ensure tht the size of the tree for suffixes whih shre the sme prefix will never exee the min memory. This is in orer to ensure tht eh suh sutree n e uilt n querie in min memory. Bse on this new prtitioning sheme, Phoophkee n Zki [] propose nother metho for reting suffix trees on isk the Trellis lgorithm. The min innovtive ie of this metho is the omintion of the prefix prtitioning n the horizontl prtitioning of the input into onseutive sustrings, or hunks. In theory, the sustring prtitioning oes not work for ny input, sine the suffixes in eh sustring prtition o not run till the en of Trellis stns for Externl Suffix Tee with Suffix Links for Inexing Genome-SLe Sequenes. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

20 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY TELLIS lgorithm prtition X into k sustrings for eh sustring X i uil suffix tree ST_X i for eh prefix P in olletion of vrile-length prefixes fin su-tree strting with P write this su-tree into seprte file on-isk for eh prefix P in olletion of vrile-length prefixes lo from isk ll su-trees strting with P merge su-trees into su-tree for prefix P write this su-tree k to isk Figure. Pseuooe of the Trellis lgorithm. the entire input string. However, this horizontl prtitioning works for most prtil inputs. Consier for exmple the Humn genome sequene of out GB in length. In ft, there is not single string representing Humn genome, ut rther sequenes of DNA in ifferent Humn hromosomes, with the lrgest sequene eing only out MB in size. Those hromosome sequenes represent nturl prtitions of the entire genome. If the size of eh nturl hunk of the input oes not llow us to uil the suffix tree for it entirely in min-memory, the hunk n e split into severl slightly overlpping sustrings. We ppen to the en of eh suh sustring exept the lst one, smll til, the prefix of the next prtition. The til of the prtition must never our s sustring of this prtition. It serves s sentinel for the suffixes of the prtition, n its positions re not inlue into the suffix tree of the prtition. In prtie, for rel-life DNA sequenes, the length of suh til is negligely smll ompre to the size of the prtition itself. After prtitioning the input into hunks of pproprite size, Trellis uils n inepenent suffix tree for eh hunk. It oes not output the entire suffix tree to isk, ut rther writes to isk the ifferent su-trees of the in-memory tree. These su-trees orrespon to the ifferent vrile-length prefixes. One trees for eh hunk re uilt n written to isk, Trellis los into memory the sutrees for ll the hunks whih shre the sme prefix. Then it merges these sutrees into the shre-prefix-se sutree for n entire input string. The pseuooe of the Trellis lgorithm is shown in Figure. As n exmple, let us pply the Trellis metho to our input string X =. Let the olletion of prefixes for prefix-se prtitioning e {,,, }. Next, we prtition X into two sustrings X = with til, nx =. Note the overlpping symol whihisusessentinelforsuffixesofx. We uil in memory the suffix tree for X,whih is shown in Figure [A], n we output it to isk in the form of two ifferent sutrees: one for prefix n the seon for prefix. The sme proeure is performe for X (Figure [B]). Then, we lo into min memory the sutrees for, sy, prefix n we merge those su-trees into the ommon -sutree for the entire X. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

21 M. BASKY,U. STEGE AND A. THOMO A B C + Figure. The steps of the Trellis lgorithm pplie to input string X =. A. Builing the suffix tree for sustring X = (). B. Builing the suffix tree for sustring X =. C. Merging the su-trees for prefix. The totl size of the tree strutures t eh step llows to perform eh step in min memory. The merge of sutrees for ifferent hunks is performe y strightforwr hrter-yhrter omprison, whih les to the sme O(N ) worst-se internl time s the rute fore lgorithms esrie ove. Trellis ws shown to perform t spee omprle to TDD. Further,Trellis oes not fil ue to insuffiient min memory (for holing the trees for eh hunk or the sutrees with ommon prefix). If we hve K hunks n P prefixes in the vrile-length prefixes olletion, the numer of rnom isk esses is O(KP). Sine oth K n P epen on the length of the input string N, the exeution time of Trellis grows qurtilly with the inrese of N, n is therefore not slle for lrger inputs. During the hrter-y-hrter omprison in the merge step, the input string is rnomly esse t ifferent positions ll over the input string. Therefore, the slility of Trellis oes not go eyon the size of the min memory esignte for the input. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

22 ALGOITHMS FO SUFFIX TEE CONSTUCTION IN EXTENAL MEMOY.. DiGeST n n externl memory multi-wy merge sort A simple reent pproh to onstrut suffix trees is se on n externl memory multi-wy merge sort []. The DiGeST lgorithm propose in [] performs t spee omprle with TDD n Trellis, n sles for lrger inputs sine it oes not use prefix-se prtitioning, ut rther outputs olletion of smll suffix trees for the ifferent sorte lexiogrphi intervls. As in Trellis, DiGeST first prtitions the input string into k hunks. The suffixes in eh hunk re sorte using ny in-memory suffix sorting lgorithm (for exmple []). The suffix rry for eh hunk is written to isk. To eh position in this suffix rry short prefix of the suffix is tthe. These prefixes signifintly improve the performne of the merging phse. After sorting the suffixes in eh hunk, onseutive piees of eh of the k suffix rrys re re from the isk into input uffers. As in the regulr multi-wy merge sort, ompetition is run mong the top elements of eh uffer n the winning suffix migrtes to n output uffer orgnize s suffix tree. When the output uffer is full, it is emptie to isk. In orer to etermine the orer of suffixes from ifferent input hunks, we first ompre the prefixes tthe to eh suffix strt position. Only if these prefixes re equl, we ompre the rest of the suffixes hrter-y-hrter. This omprison requires tht the input string e kept in min memory. Due to the hrter-y-hrter omprison of the suffixes, DiGeST runs in O(N )internl time. ell tht on verge the performne is O(N log N).Thesmeomprisonisperforme in orer to lulte the longest ommon prefix of the urrent suffix with the lst suffix previously inserte into the tree. The lulte LCP etermines the ple where the internl noe is rete, n new lef for eh suffix is e s hil of this internl noe. In this wy we uil the suffix tree in the output uffer. Before writing the output uffer to isk, the lexiogrphilly lrgest suffix in this tree is e to olletion of iviers whih serve loting multiple trees on isk. Sine the output uffer is of pre-lulte size, ll trees re of equl size, n thus, the prolem of t skew is ompletely voie. Further, eh tree is smll enough to e quikly loe into the min memory to perform serh or omprtive nlysis. The pseuooe of DiGeST is given in Figure. While DiGeST still requires the input string to e in min memory, from n externl memory point of view, it is very effiient: the lgorithm performs only two sns over the isk t n furthermore esses the isk minly sequentilly. From n internl running time point of view, this lgorithm still elongs to the group of rute-fore lgorithms with qurti running time. DiGeST stns for Disk-se Generlize Suffix Tree. Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

23 M. BASKY,U. STEGE AND A. THOMO DiGeST lgorithm prtition X into k sustrings for eh sustring X i uil suffix rry SA_X i write SA_X i to isk Merge( ) Merge( ) llote k input uffers n output uffer for eh input uffer lo prt of SA_X i into input uffer rete hep of size k re first element of eh input uffer into hep while hep is not empty trnsfer the smllest suffix of sustring j from the top of hep into output uffer fin LCP of this suffix with the lst inserte suffix rete new lef in the output suffix tree if output uffer is full write it to isk if input uffer j is empty if not en of SA_X i fill input uffer j with the next suffixes if input uffer j is not empty insert next suffix from input uffer j into hep Figure. Pseuooe of the DiGeST lgorithm... Suffix tree on isk lyouts We remrk tht most of the lgorithms esrie ove [,,,, ] o not eliver single suffix tree on isk, ut rther forest of suffix trees. This is useful from oth the onstrution n the query points of view. egring the query effiieny, if single suffix tree is of size muh lrger thn the ville min memory, then serhing for pttern of length q my inur q rnom I/Os plus one rnom I/O to ollet eh ourrene y rehing the orresponing leves. The nee to prtition the tree into meningful prtitions is even more prominent for lgorithms whih require epth-first trversl (DFS) of the entire tree, suh s fining longest ommon sustring, or fining the totl numer of ll ifferent sustrings. In these ses, the numer of rnom I/Os will e O(N), n the performne of DFS-se lgorithms will severely egre. Thus, importnt prtil requirements for the output trees re tht eh tree n e sequentilly loe n trverse entirely in min memory n tht eh tree hs some unique ientifier to e lote quikly. The most epte sheme for tree prtitioning is prtitioning y prefix. For eh prefix there is seprte tree whih ontins ll the suffixes shring this prefix. The olletion of ll possile prefixes of length p is of size P = O( Σ p ). Note tht, in orer to serh for pttern, we nee to fin the orresponing prefix, lo the orresponing su-tree y one sequentil re n then fin ll the ourrenes of this pttern in this su-tree. For the DFS, we re Copyright John Wiley & Sons, Lt. Softw. Prt. Exper. ; : Prepre using speuth.ls

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chpter 9 Greey Tehnique Copyright 2007 Person Aison-Wesley. All rights reserve. Greey Tehnique Construts solution to n optimiztion prolem piee y piee through sequene of hoies tht re: fesile lolly optiml

More information

Greedy Algorithm. Algorithm Fall Semester

Greedy Algorithm. Algorithm Fall Semester Greey Algorithm Algorithm 0 Fll Semester Optimiztion prolems An optimiztion prolem is one in whih you wnt to fin, not just solution, ut the est solution A greey lgorithm sometimes works well for optimiztion

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

10.2 Graph Terminology and Special Types of Graphs

10.2 Graph Terminology and Special Types of Graphs 10.2 Grph Terminology n Speil Types of Grphs Definition 1. Two verties u n v in n unirete grph G re lle jent (or neighors) in G iff u n v re enpoints of n ege e of G. Suh n ege e is lle inient with the

More information

Table-driven look-ahead lexical analysis

Table-driven look-ahead lexical analysis Tle-riven look-he lexil nlysis WUU YANG Computer n Informtion Siene Deprtment Ntionl Chio-Tung University, HsinChu, Tiwn, R.O.C. Astrt. Moern progrmming lnguges use regulr expressions to efine vli tokens.

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Distance vector protocol

Distance vector protocol istne vetor protool Irene Finohi finohi@i.unirom.it Routing Routing protool Gol: etermine goo pth (sequene of routers) thru network from soure to Grph strtion for routing lgorithms: grph noes re routers

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748 Outline Motivtion Bkground Regulr Expression

More information

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator. COMMON FRACTIONS BASIC DEFINITIONS * A frtion is n inite ivision. or / * In the frtion is lle the numertor n is lle the enomintor. * The whole is seprte into "" equl prts n we re onsiering "" of those

More information

Containers: Queue and List

Containers: Queue and List Continers: Queue n List Queue A ontiner in whih insertion is one t one en (the til) n eletion is one t the other en (the he). Also lle FIFO (First-In, First-Out) Jori Cortell n Jori Petit Deprtment of

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 CMPUT Introdution to Computing - Summer 22 %XLOGLQJ&RPSXWHU&LUFXLWV Chpter 4.4 3XUSRVH We hve looked t so fr how to uild logi gtes from trnsistors. Next we will look t how to uild iruits from logi gtes,

More information

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam Cmrige, Msshusetts Introution to Mtrois n Applitions Srikumr Rmlingm MERL mm//yy Liner Alger (,0,0) (0,,0) Liner inepenene in vetors: v, v2,..., For ll non-trivil we hve s v s v n s, s2,..., s n 2v2...

More information

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions Pttern Mthing Pttern Mthing Some of these leture slides hve een dpted from: lgorithms in C, Roert Sedgewik. Gol. Generlize string serhing to inompletely speified ptterns. pplitions. Test if string or its

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal CS 55 Computer Grphis Hidden Surfe Removl Hidden Surfe Elimintion Ojet preision lgorithms: determine whih ojets re in front of others Uses the Pinter s lgorithm drw visile surfes from k (frthest) to front

More information

Comparing Hierarchical Data in External Memory

Comparing Hierarchical Data in External Memory Compring Hierrhil Dt in Externl Memory Surshn S. Chwthe Deprtment of Computer Siene University of Mryln College Prk, MD 090 hw@s.um.eu Astrt We present n externl-memory lgorithm for omputing minimum-ost

More information

Internet Routing. Reminder: Routing. CPSC Network Programming

Internet Routing. Reminder: Routing. CPSC Network Programming PS 360 - Network Progrmming Internet Routing Mihele Weigle eprtment of omputer Siene lemson University mweigle@s.lemson.eu pril, 00 http://www.s.lemson.eu/~mweigle/ourses/ps360 Reminer: Routing Internet

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs.

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs. Avne Progrmming Hnout 5 Purel Funtionl Dt Strutures: A Cse Stu in Funtionl Progrmming Persistent vs. Ephemerl An ephemerl t struture is one for whih onl one version is ville t time: fter n upte opertion,

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

COMP108 Algorithmic Foundations

COMP108 Algorithmic Foundations Grph Theory Prudene Wong http://www.s.liv..uk/~pwong/tehing/omp108/201617 How to Mesure 4L? 3L 5L 3L ontiner & 5L ontiner (without mrk) infinite supply of wter You n pour wter from one ontiner to nother

More information

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS UTMC APPLICATION NOTE UT1553B BCRT TO 80186 INTERFACE INTRODUCTION The UTMC UT1553B BCRT is monolithi CMOS integrte iruit tht provies omprehensive Bus Controller n Remote Terminl funtions for MIL-STD-

More information

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014. omputer Networks 9/29/2014 IP Pket Formt Internet Routing Ki Shen IP protool version numer heder length (words) for qulity of servie mx numer remining hops (deremented t eh router) upper lyer protool to

More information

Introduction. Example

Introduction. Example OMS0 Introution isjoint sets n minimum spnning trees In this leture we will strt by isussing t struture use for mintining isjoint subsets of some bigger set. This hs number of pplitions, inluing to mintining

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals Outline CS38 Introution to Algorithms Leture 2 April 3, 2014 grph trversls (BFS, DFS) onnetivity topologil sort strongly onnete omponents heps n hepsort greey lgorithms April 3, 2014 CS38 Leture 2 2 Grphs

More information

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $ Informtion Systems 29 (2004) 23 46 A mthing lgorithm for mesuring the struturl similrity etween n XML oument n DTD n its pplitions $ Elis Bertino, Giovnn Guerrini, Mro Mesiti, * Diprtimento i Informti

More information

Midterm Exam CSC October 2001

Midterm Exam CSC October 2001 Midterm Exm CSC 173 23 Otoer 2001 Diretions This exm hs 8 questions, severl of whih hve suprts. Eh question indites its point vlue. The totl is 100 points. Questions 5() nd 6() re optionl; they re not

More information

GENG2140 Modelling and Computer Analysis for Engineers

GENG2140 Modelling and Computer Analysis for Engineers GENG4 Moelling n Computer Anlysis or Engineers Letures 9 & : Gussin qurture Crete y Grn Romn Joles, PhD Shool o Mehnil Engineering, UWA GENG4 Content Deinition o Gussin qurture Computtion o weights n points

More information

Lesson 4.4. Euler Circuits and Paths. Explore This

Lesson 4.4. Euler Circuits and Paths. Explore This Lesson 4.4 Euler Ciruits nd Pths Now tht you re fmilir with some of the onepts of grphs nd the wy grphs onvey onnetions nd reltionships, it s time to egin exploring how they n e used to model mny different

More information

PROBLEM OF APOLLONIUS

PROBLEM OF APOLLONIUS PROBLEM OF APOLLONIUS In the Jnury 010 issue of Amerin Sientist D. Mkenzie isusses the Apollonin Gsket whih involves fining the rius of the lrgest irle whih just fits into the spe etween three tngent irles

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model.

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model. Introutory Eonometris: A Moern Approh 6th Eition Woolrige Test Bnk Solutions Complete ownlo: https://testbnkre.om/ownlo/introutory-eonometris-moern-pproh-6th-eition-jeffreym-woolrige-test-bnk/ Solutions

More information

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page. 6045J/18400J: Automt, Computbility nd Complexity Mrh 30, 2005 Quiz 2: Solutions Prof Nny Lynh Vinod Vikuntnthn Plese write your nme in the upper orner of eh pge Problem Sore 1 2 3 4 5 6 Totl Q2-1 Problem

More information

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms Using SIMD Registers n Instrutions to Enle Instrution-Level Prllelism in Sorting Algorithms Timothy Furtk furtk@s.ulert. José Nelson Amrl mrl@s.ulert. Roert Niewiomski niewio@s.ulert. Deprtment of Computing

More information

Error Numbers of the Standard Function Block

Error Numbers of the Standard Function Block A.2.2 Numers of the Stndrd Funtion Blok evlution The result of the logi opertion RLO is set if n error ours while the stndrd funtion lok is eing proessed. This llows you to rnh to your own error evlution

More information

XML and Databases. Outline. XPath. Outline - Lectures. XPath Data Model. Outline - Assignments. XPath. Sebastian Maneth NICTA and UNSW

XML and Databases. Outline. XPath. Outline - Lectures. XPath Data Model. Outline - Assignments. XPath. Sebastian Maneth NICTA and UNSW Outline XML n Dtses Leture 6 Noe Seleting Queries: XPth 1.0 1. XPth Dt Moel: 7 types of noes 2. Simple Exmples 3. Lotion Steps n Pths 4. Vlue Comprison, n Other Funtions Sestin Mneth NICTA n UNSW CSE@UNSW

More information

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco Roust internl multiple preition lgorithm Zhiming Jmes Wu, Sonik, Bill Drgoset*, WesternGeo Summry Multiple ttenution is n importnt t proessing step for oth mrine n ln t. Tehniques for surfe- rpily in the

More information

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems Distriuted Systems Priniples nd Prdigms Mrten vn Steen VU Amsterdm, Dept. Computer Siene steen@s.vu.nl Chpter 11: Distriuted File Systems Version: Deemer 10, 2012 2 / 14 Distriuted File Systems Distriuted

More information

Introduction to Algebra

Introduction to Algebra INTRODUCTORY ALGEBRA Mini-Leture 1.1 Introdution to Alger Evlute lgeri expressions y sustitution. Trnslte phrses to lgeri expressions. 1. Evlute the expressions when =, =, nd = 6. ) d) 5 10. Trnslte eh

More information

Duality in linear interval equations

Duality in linear interval equations Aville online t http://ijim.sriu..ir Int. J. Industril Mthemtis Vol. 1, No. 1 (2009) 41-45 Dulity in liner intervl equtions M. Movhedin, S. Slhshour, S. Hji Ghsemi, S. Khezerloo, M. Khezerloo, S. M. Khorsny

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Parallelization Optimization of System-Level Specification

Parallelization Optimization of System-Level Specification Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion

More information

Solids. Solids. Curriculum Ready.

Solids. Solids. Curriculum Ready. Curriulum Rey www.mthletis.om This ooklet is ll out ientifying, rwing n mesuring solis n prisms. SOM CUES The Som Cue ws invente y Dnish sientist who went y the nme of Piet Hein. It is simple 3 # 3 #

More information

Problem Final Exam Set 2 Solutions

Problem Final Exam Set 2 Solutions CSE 5 5 Algoritms nd nd Progrms Prolem Finl Exm Set Solutions Jontn Turner Exm - //05 0/8/0. (5 points) Suppose you re implementing grp lgoritm tt uses ep s one of its primry dt strutures. Te lgoritm does

More information

5 ANGLES AND POLYGONS

5 ANGLES AND POLYGONS 5 GLES POLYGOS urling rige looks like onventionl rige when it is extene. However, it urls up to form n otgon to llow ots through. This Rolling rige is in Pington sin in Lonon, n urls up every Friy t miy.

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Priniples nd Prdigms Christoph Dorn Distriuted Systems Group, Vienn University of Tehnology.dorn@infosys.tuwien..t http://www.infosys.tuwien..t/stff/dorn Slides dpted from Mrten vn Steen,

More information

Graph Contraction and Connectivity

Graph Contraction and Connectivity Chpter 14 Grph Contrtion n Connetivity So fr we hve mostly overe tehniques for solving problems on grphs tht were evelope in the ontext of sequentil lgorithms. Some of them re esy to prllelize while others

More information

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA:

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA: In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the nswers to the following

More information

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2016 Sep 29th Exm 1 Nme: Note: in ll questions, the speil symol ɛ (epsilon) is used to indite the empty string. Question 1. [10 points] Speify regulr expression tht genertes the lnguge over

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

String comparison by transposition networks

String comparison by transposition networks String omprison y trnsposition networks Alexnder Tiskin (Joint work with Peter Krushe) Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin (inludes n extended version of this

More information

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview COSC 6374 Prllel Computtion Non-loking Colletive Opertions Edgr Griel Fll 2014 Overview Impt of olletive ommunition opertions Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Lecture 8: Graph-theoretic problems (again)

Lecture 8: Graph-theoretic problems (again) COMP36111: Advned Algorithms I Leture 8: Grph-theoreti prolems (gin) In Prtt-Hrtmnn Room KB2.38: emil: iprtt@s.mn..uk 2017 18 Reding for this leture: Sipser: Chpter 7. A grph is pir G = (V, E), where V

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414 Introution to Dt Mngement CSE 44 Unit 6: Coneptul Design E/R Digrms Integrity Constrints BCNF Introution to Dt Mngement CSE 44 E/R Digrms ( letures) CSE 44 Autumn 08 Clss Overview Dtse Design Unit : Intro

More information

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents Hsh-bse Subgrph Query Proessing Metho for Grph-struture XML Douments Hongzhi Wng Hrbin Institute of Teh. wngzh@hit.eu.n Jinzhong Li Hrbin Institute of Teh. lijzh@hit.eu.n Jizhou Luo Hrbin Institute of

More information

Using Red-Eye to improve face detection in low quality video images

Using Red-Eye to improve face detection in low quality video images Using Re-Eye to improve fe etetion in low qulity vieo imges Rihr Youmrn Shool of Informtion Tehnology University of Ottw, Cn youmrn@site.uottw. Any Aler Shool of Informtion Tehnology University of Ottw,

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Comparison-based Choices

Comparison-based Choices Comprison-se Choies John Ugner Mngement Siene & Engineering Stnfor University Joint work with: Jon Kleinerg (Cornell) Senhil Mullinthn (Hrvr) EC 17 Boston June 28, 2017 Preiting isrete hoies Clssi prolem:

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup COSC 6374 Prllel Computtion Communition Performne Modeling (II) Edgr Griel Fll 2015 Overview Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition Impt of olletive ommunition

More information

Bayesian Networks: Directed Markov Properties (Cont d) and Markov Equivalent DAGs

Bayesian Networks: Directed Markov Properties (Cont d) and Markov Equivalent DAGs Byesin Networks: Direte Mrkov Properties (Cont ) n Mrkov Equivlent DAGs Huizhen Yu jney.yu@s.helsinki.fi Dept. Computer Siene, Univ. of Helsinki Proilisti Moels, Spring, 2010 Huizhen Yu (U.H.) Byesin Networks:

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

Section 2.3 Functions. Definition: Let A and B be sets. A function (mapping, map) f from A to B, denoted f :A B, is a subset of A B such that

Section 2.3 Functions. Definition: Let A and B be sets. A function (mapping, map) f from A to B, denoted f :A B, is a subset of A B such that Setion 2.3 Funtions Definition: Let n e sets. funtion (mpping, mp) f from to, enote f :, is suset of suh tht x[x y[y < x, y > f ]] n [< x, y 1 > f < x, y 2 > f ] y 1 = y 2 Note: f ssoites with eh x in

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Leture Register llotion using liveness nlysis 1 Introdution to Dt-flow nlysis Lst Time Register llotion for expression trees nd lol nd prm vrs Tody Register

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

WORKSHOP 9 HEX MESH USING SWEEP VECTOR

WORKSHOP 9 HEX MESH USING SWEEP VECTOR WORKSHOP 9 HEX MESH USING SWEEP VECTOR WS9-1 WS9-2 Prolem Desription This exerise involves importing urve geometry from n IGES file. The urves re use to rete other urves. From the urves trimme surfes re

More information

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services Slle Sptio-temporl Continuous uery Proessing for Lotion-wre Servies iopeng iong Mohme F. Mokel Wli G. Aref Susnne E. Hmrush Sunil Prhkr Deprtment of Computer Sienes, Purue University, West Lfyette, IN

More information

Augmenting Sux Trees, with Applications Yossi Matias 1?, S. Muthukrishnan 2??,Suleyman Cenk Ṣahinalp 3???, and Jacob Ziv 4 y 1 Tel-Aviv University, an

Augmenting Sux Trees, with Applications Yossi Matias 1?, S. Muthukrishnan 2??,Suleyman Cenk Ṣahinalp 3???, and Jacob Ziv 4 y 1 Tel-Aviv University, an Augmenting Sux Trees, with Applitions Yossi Mtis 1?, S. Mhukrishnn??,Suleymn Cenk Ṣhinlp 3???, nd Jo Ziv 4 y 1 Tel-Aviv University, nd Bell Ls, Murry Hill Bell Ls, Murry Hill 3 University ofwrwik nd University

More information

Graph theory Route problems

Graph theory Route problems Bhelors thesis Grph theory Route prolems Author: Aolphe Nikwigize Dte: 986 - -5 Sujet: Mthemtis Level: First level (Bhelor) Course oe: MAE Astrt In this thesis we will review some route prolems whih re

More information

A decision support system prototype for fuzzy multiple objective optimization

A decision support system prototype for fuzzy multiple objective optimization EUSFLAT - LFA A eision support system prototype for fuzzy multiple ojetive optimiztion Fengjie Wu Jie Lu n Gungqun Zhng Fulty of Informtion Tehnology University of Tehnology Syney Austrli E-mil: {fengjiewjieluzhngg}@it.uts.eu.u

More information

Lecture 13: Graphs I: Breadth First Search

Lecture 13: Graphs I: Breadth First Search Leture 13 Grphs I: BFS 6.006 Fll 2011 Leture 13: Grphs I: Bredth First Serh Leture Overview Applitions of Grph Serh Grph Representtions Bredth-First Serh Rell: Grph G = (V, E) V = set of verties (ritrry

More information

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks oopertive Routing in Multi-Soure Multi-estintion Multi-hop Wireless Networks Jin Zhng Qin Zhng eprtment of omputer Siene n ngineering Hong Kong University of Siene n Tehnology, HongKong {zjzj, qinzh}@se.ust.hk

More information

Balanced Trees. 2-3 trees red-black trees B-trees. 2-3 trees red-black trees B-trees smaller than. 2-node. 3-node E J S X A C.

Balanced Trees. 2-3 trees red-black trees B-trees. 2-3 trees red-black trees B-trees smaller than. 2-node. 3-node E J S X A C. ymol tle review Blned Trees implementtion gurntee verge se serh insert delete serh hit insert delete ordered itertion? opertions on keys sequentil serh (linked list) N N N N/2 N N/2 no equls() 2-3 trees

More information

CS553 Lecture Introduction to Data-flow Analysis 1

CS553 Lecture Introduction to Data-flow Analysis 1 ! Ide Introdution to Dt-flow nlysis!lst Time! Implementing Mrk nd Sweep GC!Tody! Control flow grphs! Liveness nlysis! Register llotion CS553 Leture Introdution to Dt-flow Anlysis 1 Dt-flow Anlysis! Dt-flow

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors Evluting Regulr Expression Mthing Engines on Network n Generl Purpose Proessors Mihel Behi Wshington University Computer Siene n Engineering St. Louis, MO 63130-4899 mehi@se.wustl.eu Chrlie Wisemn Wshington

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Fault tree conversion to binary decision diagrams

Fault tree conversion to binary decision diagrams Loughorough University Institutionl Repository Fult tree onversion to inry deision digrms This item ws sumitted to Loughorough University's Institutionl Repository y the/n uthor. Cittion: ANDREWS, J.D.

More information

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018 Premster Course Algorithms Chpter 6: Shortest Pths Christin Scheieler SS 8 Bsic Grph Algorithms Overview: Shortest pths in DAGs Dijkstr s lgorithm Bellmn-For lgorithm Johnson s metho SS 8 Chpter 6 Shortest

More information

CICS Application Design

CICS Application Design CICS Applition Design In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

3D convex hulls. Convex Hull in 3D. convex polyhedron. convex polyhedron. The problem: Given a set P of points in 3D, compute their convex hull

3D convex hulls. Convex Hull in 3D. convex polyhedron. convex polyhedron. The problem: Given a set P of points in 3D, compute their convex hull Convex Hull in The rolem: Given set P of oints in, omute their onvex hull onvex hulls Comuttionl Geometry [si 3250] Lur Tom Bowoin College onvex olyheron 1 2 3 olygon olyheron onvex olyheron 4 5 6 Polyheron

More information

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees.

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees. 428 T FOU 4.3 Blned Trees T BT GOIT IN T VIOU setion work well for wide vriety of pplitions, ut they hve poor worst-se performne. s we hve noted, files lredy in order, files in reverse order, files with

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

Final Exam Review F 06 M 236 Be sure to look over all of your tests, as well as over the activities you did in the activity book

Final Exam Review F 06 M 236 Be sure to look over all of your tests, as well as over the activities you did in the activity book inl xm Review 06 M 236 e sure to loo over ll of your tests, s well s over the tivities you did in the tivity oo 1 1. ind the mesures of the numered ngles nd justify your wor. Line j is prllel to line.

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

COSC 6374 Parallel Computation. Dense Matrix Operations

COSC 6374 Parallel Computation. Dense Matrix Operations COSC 6374 Prllel Computtion Dense Mtrix Opertions Edgr Griel Fll Edgr Griel Prllel Computtion Edgr Griel erminology Dense Mtrix: ll elements of the mtrix ontin relevnt vlues ypilly stored s 2-D rry, (e.g.

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Using a User-Level Memory Thread for Correlation Prefetching

Using a User-Level Memory Thread for Correlation Prefetching Using User-Level Memory Thre for Correltion Prefething Yn Solihin Jejin Lee Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University http://iom.s.uiu.eu http://www.se.msu.eu/ jlee Astrt

More information

Lecture 12 : Topological Spaces

Lecture 12 : Topological Spaces Leture 12 : Topologil Spes 1 Topologil Spes Topology generlizes notion of distne nd loseness et. Definition 1.1. A topology on set X is olletion T of susets of X hving the following properties. 1. nd X

More information