CSE 549: Suffix Tries & Suffix Trees. All slides in this lecture not marked with * of Ben Langmead.

Size: px
Start display at page:

Download "CSE 549: Suffix Tries & Suffix Trees. All slides in this lecture not marked with * of Ben Langmead."

Transcription

1 CSE 549: Suffix Tries & Suffix Trees All slides in this lecture not mrked with * of Ben Lngmed.

2 KMP is gret, ut T = m P = n (note: m,n re opposite from previous lecture) Without preprocessing (KMP) Given preprocessing (KMP) Without preprocessing (ST) Given preprocessing (ST) Find n occurrence of P O(m+n) O(m) O(m+n) O(n) Find ll occurrences of P O(m+n) O(m) O(m + n + k) O(n+k) Find n occurrence of O(l(m+n)) O(l(m)) O( m + ln) O(ln) P1,,P l If the text is constnt over mny ptterns, pre-processing the text rther thn the pttern is etter (nd llows other efficient queries). *

3 Tries A trie (pronounced try ) is rooted tree representing tree representing collection collection of strings of with strings with one node per common prefix Smllest tree such tht: Ech edge is leled with chrcter c Σ A node hs t most one outgoing edge leled c, for c Σ Ech key is spelled out long some pth strting t the root Nturl wy to represent either set or mp where keys re strings This structure is lso known s Σ-tree

4 Tries: exmple Represent this mp with trie: Key Vlue instnt 1 internl 2 s i n t internet 3 t e The smllest tree such tht: Ech edge is leled with chrcter c Σ A node hs t most one outgoing edge leled c, for c Σ n t 1 l r n e t Ech key is spelled out long some pth strting t the root 2 3

5 Tries: exmple t s i n t e Checking for presence of key P, where n = P, is??? O(n) time If totl length of ll keys is N, trie hs O(N)??? nodes r Wht out Σ? n t 1 l n e t Depends how we represent outgoing edges. If we don t ssume Σ is smll constnt, it shows up in one or oth ounds. 2 3

6 Tries: nother exmple We cn index T with trie. The trie mps sustrings to offsets where they occur c 4 g 8 t 14 cc 12 cc 2 ct 6 gt 18 gt 0 t 10 tt 16 root: c g t c g t c t t t , ,

7 Indexing with suffixes Some Until indices now, (e.g. our indexes the inverted hve een index) sed re sed on extrcting on extrcting sustrings sustrings from T from T A very different pproch is to extrct suffixes from T. This will led us to some interesting nd prcticl index dt structures: A ANA ANANA BANANA NA NANA B A N A N A A B A N A N A N A B A N A N A N A B B A N A N A N A B A N A N A N A B A Suffix Trie Suffix Tree Suffix Arry FM Index

8 Trie Definitions A Σ-tree (trie) is rooted tree where ech edge is leled with single chrcter c ϵ Σ, such tht no node hs two outgoing edges leled with the sme chrcter. for node v in T, depth(v) or node-depth(v) is the distnce from v to the root. node-depth(r) = 0 string(v) = conctention of ll chrcters on the pth r v string-depth(v) = string(v) (note: string-depth(v) node-depth(v)) for string x, if node v with string(v) = x, we sy node(x) = v T displys string x if node v nd string y such tht xy = string(v) words(t) = { x T displys x} A suffix trie of string s is Σ-tree such tht words(t) = {s s is sustring of s} An internl/lef edge leds to n internl/lef node Defs. from: *

9 Suffix trie Build trie contining ll suffixes of text T T: GTTATAGCTGATCGCGGCGTAGCGG GTTATAGCTGATCGCGGCGTAGCGG TTATAGCTGATCGCGGCGTAGCGG TATAGCTGATCGCGGCGTAGCGG ATAGCTGATCGCGGCGTAGCGG TAGCTGATCGCGGCGTAGCGG AGCTGATCGCGGCGTAGCGG GCTGATCGCGGCGTAGCGG CTGATCGCGGCGTAGCGG TGATCGCGGCGTAGCGG GATCGCGGCGTAGCGG ATCGCGGCGTAGCGG TCGCGGCGTAGCGG CGCGGCGTAGCGG GCGGCGTAGCGG CGGCGTAGCGG GGCGTAGCGG GCGTAGCGG CGTAGCGG GTAGCGG TAGCGG AGCGG GCGG CGG GG G m(m+1)/2 chrs

10 Suffix trie First dd specil terminl chrcter to the end of T is chrcter tht does not pper elsewhere in T, nd we define it to e less thn other chrcters (for DNA: < A < C < G < T) enforces rule we re ll used to using: e.g. s comes efore sh in the dictionry. lso gurntees no suffix is prefix of ny other suffix. T: GTTATAGCTGATCGCGGCGTAGCGG GTTATAGCTGATCGCGGCGTAGCGG TTATAGCTGATCGCGGCGTAGCGG TATAGCTGATCGCGGCGTAGCGG ATAGCTGATCGCGGCGTAGCGG TAGCTGATCGCGGCGTAGCGG AGCTGATCGCGGCGTAGCGG GCTGATCGCGGCGTAGCGG CTGATCGCGGCGTAGCGG TGATCGCGGCGTAGCGG GATCGCGGCGTAGCGG ATCGCGGCGTAGCGG TCGCGGCGTAGCGG CGCGGCGTAGCGG GCGGCGTAGCGG CGGCGTAGCGG GGCGTAGCGG

11 Suffix trie T: T: Shortest (non-empty) suffix Ech pth from root to lef represents suffix; ech suffix is represented y some pth from root to lef Would this still e the cse if we hdn t dded? Longest suffix

12 Suffix trie T: Ech pth from root to lef represents suffix; ech suffix is represented y some pth from root to lef Would this still e the cse if we hdn t dded? No

13 Suffix trie We cn think of nodes s hving lels, where the lel spells out chrcters on the pth from the root to the node

14 Suffix trie How do we check whether string S is sustring of T? Note: Ech of T s sustrings is spelled out long pth from the root. I.e., every sustring is prefix of some suffix of T. Strt t the root nd follow the edges leled with the chrcters of S S = Yes, it s sustring If we fll off the trie -- i.e. there is no outgoing edge for next chrcter of S, then S is not sustring of T If we exhust S without flling off, S is sustring of T

15 Suffix trie How do we check whether string S is sustring of T? Note: Ech of T s sustrings is spelled out long pth from the root. I.e., every sustring is prefix of some suffix of T. Strt t the root nd follow the edges leled with the chrcters of S S = Yes, it s sustring If we fll off the trie -- i.e. there is no outgoing edge for next chrcter of S, then S is not sustring of T If we exhust S without flling off, S is sustring of T

16 Suffix trie How do we check whether string S is sustring of T? Note: Ech of T s sustrings is spelled out long pth from the root. I.e., every sustring is prefix of some suffix of T. Strt t the root nd follow the edges leled with the chrcters of S If we fll off the trie -- i.e. there is no outgoing edge for next chrcter of S, then S is not sustring of T If we exhust S without flling off, S is sustring of T S = Yes, it s sustring

17 Suffix trie How do we check whether string S is sustring of T? Note: Ech of T s sustrings is spelled out long pth from the root. I.e., every sustring is prefix of some suffix of T. Strt t the root nd follow the edges leled with the chrcters of S If we fll off the trie -- i.e. there is no outgoing edge for next chrcter of S, then S is not sustring of T x S = No, not sustring If we exhust S without flling off, S is sustring of T

18 Suffix trie How do we check whether string S is suffix of T? Sme procedure s for sustring, ut dditionlly check whether the finl node in the wlk hs n outgoing edge leled S = Not suffix

19 Suffix trie How do we check whether string S is suffix of T? Sme procedure s for sustring, ut dditionlly check whether the finl node in the wlk hs n outgoing edge leled S = Not suffix

20 Suffix trie How do we check whether string S is suffix of T? Sme procedure s for sustring, ut dditionlly check whether the finl node in the wlk hs n outgoing edge leled S = Is suffix

21 Suffix trie How do we count the numer of times string S occurs s sustring of T? Follow pth corresponding to S. Either we fll off, in which cse nswer is 0, or we end up t node n nd the nswer = # of lef nodes in the sutree rooted t n. n S = 2 occurrences Leves cn e counted with depth-first trversl.

22 Suffix trie How do we count the numer of times string S occurs s sustring of T? Follow pth corresponding to S. Either we fll off, in which cse nswer is 0, or we end up t node n nd the nswer = # of lef nodes in the sutree rooted t n. n S = 2 occurrences Leves cn e counted with depth-first trversl.

23 Suffix trie How do we find the longest repeted sustring of T? Find the deepest node with more thn one child

24 Suffix trie How do we find the longest repeted sustring of T? Find the deepest node with more thn one child

25 Suffix trie How mny nodes does the suffix trie hve? Is there clss of string where the numer of suffix trie nodes grows linerly with m? T = Yes: e.g. string of m s in row ( m ) 1 Root m nodes with incoming edge m + 1 nodes with incoming edge 2m + 2 nodes

26 Suffix trie How mny nodes does the suffix trie hve? Is there clss of string where the numer of suffix trie nodes grows linerly with m? T = Yes: e.g. string of m s in row ( m ) 1 Root m nodes with incoming edge m + 1 nodes with incoming edge 2m + 2 nodes

27 Suffix trie Is there clss of string where the numer of suffix trie nodes grows with m 2? Yes: n n 1 root n nodes long chin, right n nodes long chin, middle n chins of n nodes hnging off ech chin node 2n + 1 leves (not shown) Figure & exmple y Crl Kingsford n 2 + 4n + 2 nodes, where m = 2n

28 Suffix trie Is there clss of string where the numer of suffix trie nodes grows with m 2? Yes: n n 1 root n nodes long chin, right n nodes long chin, middle n chins of n nodes hnging off ech chin node 2n + 1 leves (not shown) Figure & exmple y Crl Kingsford n 2 + 4n + 2 nodes, where m = 2n

29 Suffix trie: upper ound on size Could worst-cse # nodes e worse thn O(m 2 )? Root Suffix trie Mx # nodes from top to ottom = length of longest suffix + 1 = m + 1 Deepest lef Mx # nodes from left to right = mx # distinct sustrings of ny length m O(m 2 ) is worst cse

30 Suffix trie: ctul growth k Built suffix tries for the first 500 prefixes of the lmd phge virus genome Blck curve shows how # nodes increses with prefix length # suffix trie nodes m^2 ctul m Length prefix over which suffix trie ws uilt

31 Suffix tries -> Suffix trees

32 Suffix Tree Definitions A Σ + -tree is rooted tree, T, where ech edge is leled with non-empty strings, where no node hs two outgoing edges leled with strings hving the sme first chrcter. T is compct if ll internl nodes hve 2 children. for node v in T, depth(v) or node-depth(v) is the distnce from v to the root. node-depth(r) = 0 string(v) = conctention of ll chrcters on the pth r v string-depth(v) = string(v) (note: string-depth(v) node-depth(v)) for string x, if node v with string(v) = x, we sy node(x) = v T displys string x if node v nd string y such tht xy = string(v) words(t) = { x T displys x} A suffix tree of string s is compct Σ + -tree such tht words(t) = {s s is sustring of s} Defs. from: *

33 Suffix trie: mking it smller T = Ide 1: Colesce non-rnching pths into single edge with string lel Reduces # nodes, edges, gurntees internl nodes hve >1 child

34 Suffix tree T = L leves, I internl nodes, E edges E = L + I -1 E 2I (ech internl node rnches) L + I -1 2I I L - 1 With respect to m: ut How mny leves? m L m (t most m suffixes) How mny non-lef nodes? m - 1 2m -1 nodes totl, or O(m) nodes I m -1 E = L + I - 1 2m - 2 E + L + I 4m - 3 O(m) No: totl length of edge Is the totl totl size O(m) size now? O(m) now? lels is qudrtic in m

35 Suffix tree T = L leves, I internl nodes, E edges E = L + I -1 E 2I (ech internl node rnches) L + I -1 2I I L - 1 With respect to m: ut How mny leves? m L m (t most m suffixes) How mny non-lef nodes? m - 1 2m -1 nodes totl, or O(m) nodes I m -1 E = L + I - 1 2m - 2 E + L + I 4m - 3 O(m) Is the totl size O(m) now? Is the totl size O(m) now? No: totl length of edge lels is qudrtic in m NO: The totl length of edge lels is qudrtic in m.

36 Suffix tree T = Ide 2: Store T itself in ddition to the tree. Convert tree s edge lels to (offset, length) pirs with respect to T. T = (0, 1) (6, 1) (1, 2) (1, 2) (6, 1) (6, 1) (3, 4) (6, 1) (3, 4) (3, 4) Spce required for suffix tree is now O(m)

37 Suffix tree: leves hold offsets where suffixes egin T = T = (0, 1) (6, 1) (1, 2) (0, 1) (6, 1) (1, 2) 6 (3, 4) (1, 2) (6, 1) (6, 1) (3, 4) (6, 1) (3, 4) (3, 4) 0 (1, 2) (6, 1) 3 (6, 1) (3, 4) (6, 1) (3, 4) 1

38 Suffix tree: lels T = (0, 1) (6, 1) (1, 2) 6 Agin, ech node s lel equls the conctented edge lels from the root to the node. These ren t stored explicitly. (3, 4) 0 (1, 2) (6, 1) 3 (6, 1) (3, 4) 5 4 (6, 1) (3, 4) 1 Lel = 2 Lel =

39 Suffix tree: lels T = (0, 1) (6, 1) (1, 2) 6 Becuse edges cn hve string lels, we must distinguish two notions of depth Node depth: how mny edges we must follow from the root to rech the node (3, 4) 0 (1, 2) (6, 1) 3 (6, 1) (3, 4) (6, 1) (3, 4) 1 Lel depth: totl length of edge lels for edges on pth from root to node

40 Suffix tree: spce cvet T = (0, 1) (6, 1) (1, 2) 6 Minor point: We sy the spce tken y the edge lels is O(m), ecuse we keep 2 integers per edge nd there re O(m) edges (3, 4) 0 (1, 2) (6, 1) 3 (6, 1) (3, 4) (6, 1) (3, 4) 1 To store one such integer, we need enough its to distinguish m positions in T, i.e. ceil(log 2 m) its. We usully ignore this fctor, since 64 its is plenty for ll prcticl purposes. Similr rgument for the pointers / references used to distinguish tree nodes.

41 Suffix tree: uilding Nive method 1: uild suffix trie, then colesce non-rnching pths nd relel edges Nive method 2: uild single-edge tree representing only the longest suffix, then ugment to include the 2 nd -longest, then ugment to include 3 rd -longest, etc Both re O(m 2 ) time, ut first uses O(m 2 ) spce while second uses O(m) (3, 4) 0 (1, 2) (6, 1) 3 (6, 1) (0, 1) (1, 2) (6, 1) (3, 4) (6, 1) (3, 4) 1 6 Nive method 2 is descried in Gusfield 5.4 Python implementtion t:

42 WOTD (Write-Only Top-Down) Construction Giegerich, Roert, nd Stefn Kurtz. "A comprison of impertive nd purely functionl suffix tree constructions." Science of Computer Progrmming 25.2 (1995): Build suffix tree for string s Recursive construction: For every rnching node node(u), sutree of node(u) is determined y ll suffixes of s where u is prefix. Recursively construct sutree for ll suffixes where u is prefix. Definition: remining suffixes of u R(node(u)) = { v uv is suffix of s } *

43 WOTD (Write-Only Top-Down) Construction Build suffix tree for string s Recursive construction: For every rnching node node(u), sutree of node(u) is determined y ll suffixes of s where u is prefix. Recursively construct sutree for ll suffixes where u is prefix. Definition: remining suffixes of u R(node(u)) = { v uv is suffix of s } Definition: c-group of node(u) group(node(u), c) = { w Σ * cw R(node(u))} *

44 WOTD (Write-Only Top-Down) Construction def WOTD(T : tree, node(u): node): for ech c Σ {}: G = group(node(u), c) non-rnching suffix ucv = lcp(g) if G == 1: dd lef node(ucv) s child of node(u) else: dd inner node(ucv) s child of node(u) WOTD(T, node(ucv)) rnching suffix Strt the lgorithm y clling WOTD(T, node(ε)) root node *

45 s = tttctctt WOTD Exmple suffixes re red top-to-ottom exmple from: *

46 s = tttctctt WOTD Exmple exmple from: *

47 s = tttctctt WOTD Exmple exmple from: *

48 s = tttctctt WOTD Exmple exmple from: *

49 s = tttctctt WOTD Exmple exmple from:

50 WOTD Properties Worst cse time still O( T 2 ) Expected cse time O( T log T ) Write-only property & recursive construction lends itself well to prllelism Good cching properties (loclity of reference for sustrings elonging to sutree) Top-down construction order llows lzy construction s discussed in: Giegerich, Roert, Stefn Kurtz, nd Jens Stoye. "Efficient implementtion of lzy suffix trees." Softwre: Prctice nd Experience (2003):

51 Suffix tree: uilding Other methods for construction: Method of choice: Ukkonen s lgorithm Ukkonen, Esko. "On-line construction of suffix trees." Algorithmic 14.3 (1995): O(m) time nd spce Hs online property: if T rrives one chrcter t time, lgorithm efficiently updtes suffix tree upon ech rrivl We won t cover it here; see Gusfield Ch. 6 for detils Or just Google Ukkonen s lgorithm

52 Suffix tree: ctul growth # suffix trie nodes Built suffix trees for the first 500 prefixes of the lmd phge virus genome Blck curve shows # nodes incresing with prefix length Compre with suffix trie: m^2 ctul m # suffix tree nodes 123 K nodes m ctul m Length prefix over which suffix tree ws uilt Length prefix over which suffix trie ws uilt

53 Suffix tree How do we check whether string S is sustring of T? Essentilly sme procedure s for suffix trie, except we hve to del with colesced edges S = Yes, it s sustring 4 0 3

54 Suffix tree How do we check whether string S is sustring of T? Essentilly sme procedure s for suffix trie, except we hve to del with colesced edges S = Yes, it s sustring 4 0 3

55 Suffix tree How do we check whether string S is suffix of T? Essentilly sme procedure s for suffix trie, except we hve to del with colesced edges 2 S = 5 1 Yes, it s suffix

56 Suffix tree How do we count the numer of times string S occurs s sustring of T? Sme procedure s for suffix trie 6 2 S = 5 1 Occurs twice 4 0 3

57 Suffix tree: pplictions With suffix tree of T, we cn find ll mtches of P to T. Let k = # mtches. E.g., P =, T = O(n) Step 1: wlk down pth If we fll off there re no mtches 6 O(k) Step 2: visit ll lef nodes elow Report ech lef offset s mtch offset O(n + k) time

58 Suffix tree ppliction: find long common sustrings Dots re mximl unique mtches (MUMs), kind of long sustring shred y two sequences Red = mtch ws etween like strnds, green = different strnds Axes show different strins of Helicocter pylori, cterium found in the stomch nd ssocited with gstric ulcers

59 Suffix tree ppliction: find longest common sustring To find the longest common sustring (LCS) of X nd Y, mke new string X#Y where # nd re oth terminl symols. Build suffix tree for X#Y. X = xx Y = x X#Y = xx#x x #x X Y x X Y x X Y 5 #x X Y x 12 Consider leves: offsets in [0, 4] re suffixes of X, offsets in [6, 11] re suffixes of Y 1 4 X Y #x 7 11 X #x x#x Y x 10 X Y 2 #x 8 Trverse the tree nd nnotte ech node ccording to whether leves elow it include suffixes of X, Y or oth The deepest node nnotted with oth X nd Y hs LCS s its lel. O( X + Y ) time nd spce.

60 Suffix tree ppliction: generlized suffix trees This is one exmple of mny pplictions where it is useful to uild suffix tree over mny strings t once Such tree is clled generlized suffix tree. These re introduced in Gusfield 6.4. X Y x #x x #x X Y x X Y 5 X Y x 12 4 X Y 11 X 9 Y X Y #x #x x#x x #x

61 Longest Common Extension Longest common extension: We re given strings S nd T. In the future, mny pirs (i,j) will e provided s queries, nd we wnt to quickly find: the longest sustring of S strting t i tht mtches sustring of T strting t j. S LCE(i,j) T LCE(i,j) i j Build generlized suffix tree for S nd T. Preprocess tree so tht lowest common ncestors (LCA) cn e found in constnt time. This cn e done using rnge-minimum queries (RMQ) Crete n rry mpping suffix numers to lef nodes. O( S + T ) O( S + T ) O( S + T ) LCA(i,j) Given query (i,j): Find the lef nodes for i nd j Return string of LCA for i nd j O(1) O(1) j i j i

62 Suffix trees in the rel world: MUMmer FASTA file contining reference ( text ) FASTA file contining ALU string Indexing phse: ~2 minutes Mtching phse: very fst Columns: 1. Mtch offset in T 2. Mtch offset in P 3. Length of exct mtch...

63 Suffix trees in the rel world: MUMmer MUMmer v3.32 time nd memory scling when indexing incresingly lrger frctions of humn chromosome 1 Pek memory usge (megytes) Time (seconds) Frction of humn chromosome 1 indexed Frction of humn chromosome 1 indexed For whole chromosome 1, took 2m:14s nd used 3.94 GB memory

64 Suffix trees in the rel world: MUMmer Attempt to uild index for whole humn genome reference: mummer:!suffix!tree!construction!filed:!textlen= ! lrger!thn!mximl!textlen= We cn predict it would hve tken out 47 GB of memory

65 Suffix trees in the rel world: the constnt fctor While O(m) is desirle, the constnt in front of the m limits wider use of suffix trees in prctice Constnt fctor vries depending on implementtion: Estimte of MUMmer s constnt fctor = 3.94 GB / 250 million nt ytes per node Literture reports implementtions chieving s little s 8.5 ytes per node, ut no implementtion used in prctice tht I know of is etter thn 12.5 ytes per node Kurtz, Stefn. "Reducing the spce requirement of suffix trees." Softwre Prctice nd Experience (1999):

66 Suffix tree: summry Orgnizes ll suffixes into n incredily useful, flexile dt structure, in O(m) time nd spce A nive method (e.g. suffix trie) could esily e qudrtic or worse Used in prctice for whole genome lignment, repet identifiction, etc 3 (5, 21) (12, 14) 10 (4, 1) (6, 2) 5 (8, 18) (23, 3) (8, 18) (13, 1) (14, 12) (19, 7) (16, 1) (3, 1) (7, 1) (1, 1) (0, 1) (17, 9) (25, 1) 22 (3, 1) (12, 14) (2, 24) 2 11 (4, 22) (6, 2) Actul memory footprint (ytes per node) is quite high, limiting usefulness 4 (8, 18) GTTATAGCTGATCGCGGCGTAGCGG GTTATAGCTGATCGCGGCGTAGCGG TTATAGCTGATCGCGGCGTAGCGG TATAGCTGATCGCGGCGTAGCGG ATAGCTGATCGCGGCGTAGCGG TAGCTGATCGCGGCGTAGCGG AGCTGATCGCGGCGTAGCGG 19 1 (23, 3) (9, 17) (25, 1) 25 (10, 16) (7, 1) (1, 1) (16, 1) (8, 18) (15, 1) 16 (19, 7) (16, 1) 13 (17, 9) GCTGATCGCGGCGTAGCGG CTGATCGCGGCGTAGCGG (20, 6) (2, 24) 18 (25, 1) 21 TGATCGCGGCGTAGCGG GATCGCGGCGTAGCGG ATCGCGGCGTAGCGG TCGCGGCGTAGCGG CGCGGCGTAGCGG GCGGCGTAGCGG 0 15 (25, 1) (17, 9) 24 (25, 1) 23 CGGCGTAGCGG GGCGTAGCGG GCGTAGCGG CGTAGCGG GTAGCGG TAGCGG AGCGG GCGG CGG GG G m chrs m(m+1)/2 chrs

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Lecture 10: Suffix Trees

Lecture 10: Suffix Trees Computtionl Genomics Prof. Ron Shmir, Prof. Him Wolfson, Dr. Irit Gt-Viks School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' רון שמיר, פרופ' חיים וולפסון, דר' עירית גת-ויקס ביה"ס למדעי

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

Suffix trees. December Computational Genomics

Suffix trees. December Computational Genomics Computtionl Genomics Prof Irit Gt-Viks, Prof. Ron Shmir, Prof. Roded Shrn School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' עירית גת-ויקס, פרופ' רון שמיר, פרופ' רודד שרן ביה"ס למדעי

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline CSCI1950 Z Comput4onl Methods for Biology Lecture 2 Ben Rphel Jnury 26, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Outline Review of trees. Coun4ng fetures. Chrcter bsed phylogeny Mximum prsimony Mximum

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

From Indexing Data Structures to de Bruijn Graphs

From Indexing Data Structures to de Bruijn Graphs From Indexing Dt Structures to de Bruijn Grphs Bstien Czux, Thierry Lecroq, Eric Rivls LIRMM & IBC, Montpellier - LITIS Rouen June 1, 201 Czux, Lecroq, Rivls (LIRMM) Generlized Suffix Tree & DBG June 1,

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Lists in Lisp and Scheme

Lists in Lisp and Scheme Lists in Lisp nd Scheme Lists in Lisp nd Scheme Lists re Lisp s fundmentl dt structures, ut there re others Arrys, chrcters, strings, etc. Common Lisp hs moved on from eing merely LISt Processor However,

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

ZZ - Advanced Math Review 2017

ZZ - Advanced Math Review 2017 ZZ - Advnced Mth Review Mtrix Multipliction Given! nd! find the sum of the elements of the product BA First, rewrite the mtrices in the correct order to multiply The product is BA hs order x since B is

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

Efficient implementation of lazy suffix trees

Efficient implementation of lazy suffix trees SOFTWARE PRACTICE AND EXPERIENCE Softw. Prct. Exper. 2003; 33:1035 1049 (DOI: 10.1002/spe.535) Efficient implementtion of lzy suffix trees R. Giegerich 1,S.Kurtz 2 nd J. Stoye 1,, 1 Fculty of Technology,

More information

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

I/O Efficient Dynamic Data Structures for Longest Prefix Queries

I/O Efficient Dynamic Data Structures for Longest Prefix Queries I/O Efficient Dynmic Dt Structures for Longest Prefix Queries Moshe Hershcovitch 1 nd Him Kpln 2 1 Fculty of Electricl Engineering, moshik1@gmil.com 2 School of Computer Science, himk@cs.tu.c.il, Tel Aviv

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

binary trees, expression trees

binary trees, expression trees COMP 250 Lecture 21 binry trees, expression trees Oct. 27, 2017 1 Binry tree: ech node hs t most two children. 2 Mximum number of nodes in binry tree? Height h (e.g. 3) 3 Mximum number of nodes in binry

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC0 Introduction to Dt Structures Lecture 7 Queues, Priority Queues Queues I A queue is First-In, First-Out = FIFO uffer e.g. line-ups People enter from the ck of the line People re served (exit) from

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example:

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example: cisc1110 fll 2010 lecture VI.2 cll y vlue function prmeters more on functions more on cll y vlue nd cll y reference pssing strings to functions returning strings from functions vrile scope glol vriles

More information

Orthogonal line segment intersection

Orthogonal line segment intersection Computtionl Geometry [csci 3250] Line segment intersection The prolem (wht) Computtionl Geometry [csci 3250] Orthogonl line segment intersection Applictions (why) Algorithms (how) A specil cse: Orthogonl

More information

On String Matching in Chunked Texts

On String Matching in Chunked Texts On String Mtching in Chunked Texts Hnnu Peltol nd Jorm Trhio {hpeltol, trhio}@cs.hut.fi Deprtment of Computer Science nd Engineering Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finlnd

More information

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup Regulr Expression Mtching with Multi-Strings nd Intervls Philip Bille Mikkel Thorup Outline Definition Applictions Previous work Two new problems: Multi-strings nd chrcter clss intervls Algorithms Thompson

More information

Integration. September 28, 2017

Integration. September 28, 2017 Integrtion September 8, 7 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

Stack. A list whose end points are pointed by top and bottom

Stack. A list whose end points are pointed by top and bottom 4. Stck Stck A list whose end points re pointed by top nd bottom Insertion nd deletion tke plce t the top (cf: Wht is the difference between Stck nd Arry?) Bottom is constnt, but top grows nd shrinks!

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2014 Dec 11 th /13 th Finl Exm Nme: Note: in ll questions, the specil symol ɛ (epsilon) is used to indicte the empty string. Question 1. [5 points] Consider the following regulr expression;

More information

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples COMPUTER SCIENCE 123 Foundtions of Computer Science 6. Tuples Summry: This lecture introduces tuples in Hskell. Reference: Thompson Sections 5.1 2 R.L. While, 2000 3 Tuples Most dt comes with structure

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information