Regular Expressions and Automata using Miranda

Size: px
Start display at page:

Download "Regular Expressions and Automata using Miranda"

Transcription

1 Regulr Expressions nd Automt using Mirnd Simon Thompson Computing Lortory Univerisity of Kent t Cnterury My 1995 Contents 1 Introduction ::::::::::::::::::::::::::::::::: 1 2 Regulr Expressions ::::::::::::::::::::::::::::: 2 Mtching regulr expressions :::::::::::::::::::::::: 4 4 Sets :::::::::::::::::::::::::::::::::::::: 6 5 Non-deterministic Finite Automt ::::::::::::::::::::: 11 6 Simulting n NFA :::::::::::::::::::::::::::::: 1 7 Implementing n exmple :::::::::::::::::::::::::: 16 8 Building NFAs from regulr expressions ::::::::::::::::::: 17 9 Deterministic mchines ::::::::::::::::::::::::::: Trnsforming NFAs to DFAs ::::::::::::::::::::::::: 2 11 Minimising DFA :::::::::::::::::::::::::::::: Regulr definitions :::::::::::::::::::::::::::::: Introduction In these notes Mirnd is used s vehicle to introduce regulr expressions, pttern mtching, nd their implementtions y mens of non-deterministic nd deterministic c Simon Thompson,

2 2 utomt. As prt of the mteril, we give n implementtion of the ides, contined in set of files. References to this mteril re scttered through the text. The files cn e otined y following the instructions in science/mirnd crft/regexp.html This mteril is sed on the tretment of the suject in [Aho et. l.], ut provides full implementtions rther thn their pseudo-code versions of the lgorithms. The mteril gives n illustrtion of mny of the fetures of Mirnd, including polymorphism (the sttes of n NFA cn e represented y ojects of ny type); modulristion (the system is split cross numer of modules); higher-order functions (used in finding limits of processes, for exmple) nd other fetures. A tutoril introduction to Mirnd cn e found in [Thompson]. The pper egins with definitions of regulr expressions, nd how strings re mtched to them; this lso gives our first Mirnd tretment lso. After descriing the strct dt type of sets we define non-deterministic finite utomt, nd their implementtion in Mirnd. We then show how to uild n NFA corresponding to ech regulr expression, nd how such mchine cn e optimised, first y trnsforming it into deterministic mchine, nd then y minimising the stte spce of the DFA. We conclude with discussion of regulr definitions, nd show how recognisers for strings mtching regulr definitions cn e uilt. 2. Regulr Expressions Regulr expressions re ptterns which cn e used to descrie sets of strings of chrcters of vrious kinds, such s the identifiers of progrmming lnguge strings of lphnumeric chrcters which egin with n lphetic chrcter; the numers integer or rel given in progrmming lnguge; nd so on. There re five sorts of pttern, or regulr expression: " This is the Greek chrcter epsilon, which mtches the empty string. x xis ny chrcter. This mtches the chrcter itself. (r 1 r 2 ) r 1 nd r 2 re regulr expressions. (r 1 r 2 ) r 1 nd r 2 re regulr expressions. (r)* r is regulr expression.

3 Exmples of regulr expressions include ( ()), (() (" ()*)) nd hello. In order to give more redle version of these, it is ssumed tht * inds more tightly thn juxtposition (i.e. (r 1 r 2 )), nd tht juxtposition inds more tightly thn (r 1 r 2 ). This mens tht r 1 r 2 * will men (r 1 (r 2 )*), not ((r 1 r 2 ))*, nd tht r 1 r 2 r will men r 1 (r 2 r ), not (r 1 r 2 )r. A Mirnd lgeric type representing regulr expressions is given y reg ::= Epsilon Literl chr Or reg reg Then reg reg Str reg This definition nd those which follow cn e found in the file regexp.m. Mirnd representtions of ( ()) nd (() (" ()*)) re The Or (Literl ) (Then (Literl ) (Literl )) Or (Then (Literl ) (Literl )) (Or Epsilon (Str (Literl ))) respectively. In order to shorten these definitions we will usully define constnt literls such s = Literl = Literl so tht the expressions ove ecome Or (Then ) Or (Then ) (Or Epsilon (Str )) If we use the infix forms of Or nd Then, $Or nd $Then, they red $Or ( $Then ) ( $Then ) $Or (Epsilon $Or (Str )) Functions over the type of regulr expressions re defined y recursion over the structure of the expression. Exmples include

4 4 literls :: reg -> [chr] literls Epsilon = [] literls (Literl ch) = [ch] literls (Or r1 r2) = literls r1 ++ literls r2 literls (Then r1 r2) = literls r1 ++ literls r2 literls (Str r) = literls r which prints list of the literls ppering in regulr expression, nd printre :: reg -> [chr] printre Epsilon = "@" printre (Literl ch) = [ch] printre (Or r1 r2) = "(" ++ printre r1 ++ " " ++ printre r2 ++ ")" printre (Then r1 r2) = "(" ++ printre r1 ++ printre r2 ++ ")" printre (Str r) = "(" ++ printre r ++")*" which gives printle form of regulr expression. Note is used to represent epsilon in ASCII. Exercises 1. Write more redle form of the expression (((( ) c)(()* ()*))(c d)). 2. Wht is the unrevited form of ((x?)*(y?)*)+?. Mtching regulr expressions Regulr expressions re ptterns. expression. We should sk which strings mtch ech regulr

5 5 " The empty string mtches epsilon. x The chrcter x mtches the pttern x, for ny chrcter x. (r 1 r 2 ) The string st will mtch (r 1 r 2 ) if st mtches either r 1 or r 2 (or oth). (r 1 r 2 ) The string st will mtch (r 1 r 2 ) if st cn e split into two sustrings st 1 nd st 2, st = st 1 ++st 2, so tht st 1 mtches r 1 nd st 2 mtches r 2. (r)* The string st will mtch (r)* if st cn e split into zero or more sustrings, st = st 1 ++st st n, ech of which mtches r. The zero cse implies tht the empty string will mtch (r)* for ny regulr expression r. This cn e implemented in Mirnd, in the file mtches.m. The first three cses re simple trnslitertion of the definitions ove. mtches :: reg -> string -> ool mtches Epsilon st = (st = "") mtches (Literl ch) st = (st = [ch]) mtches (Or r1 r2) st = mtches r1 st \/ mtches r2 st In the cse of juxtposition, we need n uxiliry function which gives the list contining ll the possile wys of splitting up list. splits :: [*] -> [ ([*],[*]) ] splits st = [ (tke n st,drop n st) n <- [0..#st] ] For exmple, splits [2,] is [([],[2,]),([2],[]),([2,],[])]. A string will mtch (Then r1 r2) if t lest one of the splits gives strings which mtch r1 nd r2. mtches (Then r1 r2) st = or [mtches r1 s1 & mtches r2 s2 (s1,s2)<-splits st] The finl cse is tht of Str. We cn explin * s either " or s followed y *.

6 6 We cn use this to implement the check for the mtch, ut it is prolemtic when cn e mtched y ". When this hppens, the mtch is tested recursively on the sme string, giving n infinite loop. This is voided y disllowing n epsilon mtch on the first mtch on hs to e non-trivil. mtches (Str r) st = mtches Epsilon st \/ or [ mtches r s1 & mtches (Str r) s2 (s1,s2) <- frontsplits st ] frontsplits is defined like splits ut so s to exclude the split ([],st). Exercises. Argue tht the string " mtches ( (c)*)* nd tht the string mtches (( )*()*). 4. Why does the string not mtch (( )*()*)? 5. Give informl descriptions of the sets of strings mtching the following regulr expressions. ( )*( )*( ) ( )*( )( ) "?()+? 6. Give regulr expressions descriing the following sets of strings All strings of s nd s contining t most two s. All strings of s nd s contining exctly two s. All strings of s nd s of length t most three. All strings of s nd s which contin no repeted djcent chrcters, tht is no sustring of the form or. 4. Sets A set is collection of elements of prticulr type, which is oth like nd unlike list. Lists re fmilir from Mirnd, nd exmples include

7 7 [Joe,Sue,Ben] [Ben,Sue,Joe] [Joe,Sue,Sue,Ben] [Joe,Sue,Ben,Sue] Ech of these lists is different not only do the elements of list mtter, ut lso the order in which they occur, nd their multiplicity (the numer of times ech element occurs). In mny situtions, order nd multiplicity re irrelevnt. If we wnt to tlk out the collection of people coming to our irthdy prty, we just wnt the nmes we cnnot invite someone more thn once, so multiplicity is not importnt; the order we might list them in is lso of no interest. In other words, ll we wnt to know is the set of people coming. In the exmple ove, this is the set contining Joe, Sue nd Ben. Sets cn e implemented in numer of wys in Mirnd, nd the precise form is not importnt for the user. It is sensile to declre the type s n strct dt type, or stype, illustrted in Figure 1. The stype nd its implementtion re found in the file sets.m. The implementtion we hve given represents set s n ordered list of elements without repetitions. The individul functions re descried nd implemented s follows. sing is the singleton set, consisting of the single element sing = [] union,inter,diff give the union, intersection nd difference of two sets. The union consists of the elements occurring in either set (or oth), the intersection of those elements in oth sets nd the difference of those elements in the first ut not the second set; their definitions re given in Figure 2. The empty set is the empty list empty = [] memset x tests whether is memer of the set x. Note tht this is n optimistion of the function memer over lists; since the list is ordered, we need look no further once we hve found n element greter thn the one we seek. memset [] = Flse memset (:x) = memset x, if < = True, if = = Flse, otherwise

8 8 stype set * with sing :: * -> set * union,inter,diff :: set * -> set * -> set * empty :: set * memset :: set * -> * -> ool suset :: set * -> set * -> ool eqset :: set * -> set * -> ool mpset :: (* -> **) -> set * -> set ** filterset,seprte :: (*->ool) -> set * -> set * foldset :: (* -> * -> *) -> * -> set * -> * mkeset :: [*] -> set * showset :: (*->[chr]) -> set * -> [chr] crd :: set * -> num fltten :: set * -> [*] setlimit :: (set * -> set *) -> set * -> set * Figure 1: The set stype union [] y = y union x [] = x union (:x) (:y) = : union x (:y) inter [] y = [] inter x [] = [] inter (:x) (:y) = inter x (:y), if < = : union x y, if = = : union (:x) y, otherwise, if < = : inter x y, if = = inter (:x) y, otherwise diff [] y = [] diff x [] = x diff (:x) (:y) = : diff x (:y), if < = diff x y, if = = diff (:x) y, otherwise Figure 2: Set opertions

9 9 suset x y tests whether x is suset of y; tht is whether every element of x is n element of y. suset [] y = True suset x [] = Flse suset (:x) (:y) = Flse, if < = suset x y, if = = suset (:x) y, if > eqset x y tests whether two sets re equl. eqset = (=) mpset, filterset nd foldset ehve like mp, filter nd foldr except tht they operte over sets. seprte is synonym for filterset. mpset f = mkeset. (mp f) filterset = filter seprte = filterset foldset = foldr mkeset turns list into set mkeset = remdups. sort where remdups [] = [] remdups [] = [] remdups (:x) = : remdups x, if < = remdups x, otherwise where = hd x showset f gives printle version of set, one item per line, using the function f to give printle version of ech element. showset f = conct. (mp ((++"\n"). f))

10 10 crd x gives the numer of elements in x crd = (#) fltten x turns set x into n ordered list of the elements of the set fltten = id setlimit f x gives the limit of the sequence x, f x, f (f x), f (f (f x)),... tht is the first element in the sequence whose successor is equl, s set, to the element itself. In other words, keep pplying f until fixed point or limit is reched. setlimit f x = x, if eqset x next = setlimit f next, otherwise where next = f x Exercises 7. How is the function powerset :: set * -> set (set *) which returns the set of ll susets of set defined? 8. How would you define the functions setunion :: set (set *) -> set * setinter :: set (set *) -> set * which return the union nd intersection of set of sets? 9. Cn infinite sets (of numers, for instnce) e dequtely represented y ordered lists? Cn you tell if two infinite lists re equl, for instnce? 10. The stype set * cn e represented in numer of different wys. Alterntives include: ritrry lists (rther thn ordered lists without repetitions), nd oolen vlued functions, tht is elements of the type * -> ool. Give implementtions of the type using these two representtions.

11 11 5. Non-deterministic Finite Automt A Non-deterministic Finite Automton or NFA is simple mchine which cn e used to recognise regulr expressions. It consists of four components A finite set of sttes, S. A finite set of moves. A strt stte (in S). A set of terminl or finl sttes ( suset of S). In Mirnd (see file nf types.m) this is written nf * ::= NFA (set *) (set (move *)) * (set *) This hs een represented y n lgeric type rther thn 4-tuple simply for redility. The type of sttes cn e different in different pplictions, nd indeed in the following we use oth numers nd sets of numers s sttes. A move is etween two sttes, nd is either given y chrcter, or n ". move * ::= Move * chr * Emove * * The first exmple of n NFA, clled M, follows The sttes re 0,1,2,, with the strt stte 0 indicted y n incoming rrow, nd the finl sttes indicted y shded circles. In this cse there is single finl stte,. The moves re indicted y the rrows, mrked with chrcters nd in this cse. From

12 12 stte 0 there re two possile moves on symol, to1 nd to remin t 0. This is one source of the non-determinism in the mchine. The Mirnd representtion of the mchine is NFA (mkeset [0..]) (mkeset [ Move 0 0, Move 0 1, Move 0 0, Move 1 2, Move 2 ]) 0 (sing ) A second exmple, clled N, is illustrted elow The Mirnd representtion of this mchine is NFA (mkeset [0..5]) (mkeset [ Move 0 1, Move 1 2, Move 0, Move 4, Emove 4, Move 4 5 ]) 0 (mkeset [2,5]) This mchine contins two kinds of non-determinism. The first is t stte 0, from which

13 1 it is possile to move to either 1 or on reding. The second occurs t stte : itis possile to move invisily from stte to stte 4 on the epsilon move, Emove 4. The Mirnd code for these mchines together with function print nf to print n nf whose sttes re numered cn e found in the file nf misc.m. How do these mchines recognise strings? A move cn e mde from one stte s to nother t either if the mchine contins Emove s t or if the next symol to e red is, sy, nd the mchine contins move Move s t. A string will e ccepted y mchine if there is sequence of moves through sttes of the mchine strting t the strt stte nd terminting t one of the terminl sttes this is clled n ccepting pth. For instnce, the pth 0 ;! 1 ;! 2 ;! is n ccepting pth through M for the string. This mens tht the mchine M ccepts this string. Note tht other pths through the mchine re possile for this string, n exmple eing 0 ;! 0 ;! 0 ;! 0 All tht is needed for the mchine to ccept is one ccepting pth; it does not ffect cceptnce if there re other non-ccepting (or indeed ccepting) pths. More thn one ccepting pth cn exist. Mchine N ccepts the string y oth 0 ;! 1 ;! 2 nd 0 ;! ;! " 4 ;! 5 A mchine will reject string only when there is no ccepting pth. Mchine N rejects the string, since the two pths through the mchine lelled y fil to terminte in finl stte: 0 ;! 1 0 ;! Mchine N rejects the string since there is no pth through the mchine lelled y : fter reding the mchine cn e in stte 1, or 4, from none of these cn n move e mde. 6. Simulting n NFA As ws explined in the lst section, string st is ccepted y mchine M when there is t lest one ccepting pth lelled y st through M, nd is rejected y M when no such pth exists.

14 14 The key to implementtion is to explore simultneously ll possile pths through the mchine lelled y prticulr string. Tke s n informl exmple the string nd the mchine N. After reding no input, the mchine cn only e in stte 0. On reding n there re moves to sttes 1 nd ; however this is not the whole story. From stte it is possile to mke n "-move to stte 4, so fter reding the mchine cn e in ny of the sttes f1,,4g. On reding, we hve to look for ll the possile moves from ech of the sttes f1,,4g. From 1 we cn move to 2, from to 4 nd from 4 to 5 no"-moves re possile from the sttes f2,4,5g, nd so the sttes ccessile fter reding the string re f2,4,5g. Is this string to e ccepted y N? We ccept it exctly if the set contins finl stte it contins oth 2 nd 5, so it is ccepted. Note tht the sttes ccessile fter reding re f1,,4g; this set contins no finl stte, nd so the mchine N rejects the string. There is generl pttern to this process, which consists of repetition of Tke set of sttes, such s f1,,4g, nd find the set of sttes ccessile y move on prticulr symol, e.g.. In this cse it is the set f2,4,5g. This is clled onemove in nf li.m. Tke set of sttes, like f1,g, nd find the set of sttes ccessile from the sttes y zero or more "-moves. In this exmple, it is the set f1,,4g. This is the "-closure of the originl set, nd is clled closure in nf li.m. The functions onemove nd closure re composed in the function onetrns, nd this function is iterted long the string y the trns function of implement nf.m. Implementtion in Mirnd We discuss the development of the function trns :: nf * -> string -> set * top-down. Itertion long string is given y foldl foldl :: (set * -> chr -> set *) -> set * -> string -> set * foldl f r [] = r foldl f r (c:cs) = foldl f (f r c) cs

15 15 The first rgument, f, is the step function, tking set nd chrcter to the sttes ccessile from the set on the chrcter. The second rgument, r, is the strting stte, nd the finl rgument is the string long which to iterte. How does the function operte? If given n empty string, the strt stte is the result. If given string (c:cs), the function is clled gin, with the til of the string, cs, nd with new strting stte, (f r c), which is the result of pplying the step function to the strting set of sttes nd the first chrcter of the string. Now to develop trns. trns mch str = foldl step strtset str where step set ch = onetrns mch ch set strtset = closure mch (sing (strtstte mch)) step is derived from onetrns simply y suppling its mchine rgument mch, similrly strtset is derived from the mchine mch, using the functions closure nd strtstte. All these functions re defined in nf li.m. We discuss their definitions now. onetrns :: nf * -> chr -> set * -> set * onetrns mch c x = closure mch (onemove mch c x) Next, we exmine onemove, onemove :: nf * -> chr -> set * -> set * onemove (NFA sttes moves strt term) c x = mkeset [ s t <- fltten x ; Move z d s <- fltten moves ; z=t ; c=d ] The essentil ide here is to run through the elements t of the set x nd the set of moves, moves looking for ll c-moves originting t t. For ech of these, the result of the move, s, goes into the resulting set. The definition uses list comprehensions, so it is necessry first to fltten the sets

16 16 x nd moves into lists, nd then to convert the list comprehension into set y mens of mkeset. closure :: nf * -> set * -> set * closure (NFA sttes moves strt term) = setlimit dd where dd stteset = union stteset (mkeset ccessile) where ccessile = [ s x <- fltten stteset ; Emove y s <- fltten moves ; y=x ] The essence of closure is to tke the limit of the function which dds to set of sttes ll those sttes which re ccessile y single "-move; in the limit we get set to which no further sttes cn e dded y "-trnsitions. Adding the sttes got y single "-moves is ccomplished y the function dd nd the uxiliry definition ccessile which resemles the construction of onemove. 7. Implementing n exmple The mchine P is illustrted y Mchine P Exercise 11. Give the Mirnd definition of the mchine P.

17 17 The "-closure of the set f0g is the set f0,1,2,4g. Looking t the definition of closure ove, the first ppliction of the function dd to f0g gives the set f0,1g; pplying dd to this gives f0,1,2,4g. Applying dd to this set gives the sme set, hence this is the vlue of setlimit here. The set of sttes with which we strt the simultion is therefore f0,1,2,4g. Suppose the first input is ; pplying onemove revels only one move, from 2 to. Tking the closure of the set fg gives the set f1,2,,4,6,7g. A move from here is only from 4 to 5; closing under "-moves gives f1,2,4,5,6,7g. An move from here is possile in two wys: from 2 to nd from 7 to 8; closing up f,8g gives f1,2,,4,6,7,8g. Is the string therefore ccepted y P? Yes, ecuse 8 is memer of f1,2,,4,6,7,8g. This sequence cn e illustrted thus Exercise 12. Show tht the string is not ccepted y the mchine P. 8. Building NFAs from regulr expressions For ech regulr expression it is possile to uild n NFA which ccepts exctly those strings mtching the expression. The mchines re illustrted in Figure. The construction is y induction over the structure of the regulr expression: the mchines for n chrcter nd for " re given outright, nd for complex expressions, the mchines re uilt from the mchines representing the prts. It is strightforwrd to justify the construction. (e f) Any pth through M(e f) must e either pth through M(e) or pth through M(f) (with " t the strt nd end. ef Any pth through M(ef) will e pth through M(e) followed y pth through M(f).

18 18 M( ) M() M(e f) M(e) M(f) M(ef) M(e) M(f) M(e*) M(e) Figure : Building NFAs for regulr expressions

19 19 e* Pths through M(e*) re of two sorts; the first is simply n ", others egin with pth through M(e), nd continue with pth through M(e*). In other words, pths through M(e*) go through M(e) zero or more times. The mchine for the pttern ( )* is given y M(( )*) The Mirnd description of the construction is given in uild nf.m. At the top level the function uild :: reg -> nf num does the recursion. For the se cse, uild (Literl c) = NFA (mkeset [0..1]) (sing (Move 0 c 1)) 0 (sing 1) The definition of uild Epsilon is similr. In the other cses we define uild (Or r1 r2) = m or (uild r1) (uild r2) uild (Then r1 r2) = m then (uild r1) (uild r2) uild (Str r) = m str (uild r) in which the functions m or nd so on uild the mchines from their components s illustrted.

20 20 We mke certin ssumptions out the NFAs we uild. We tke it tht the sttes re numered from 0, with the finl stte hving the highest numer. Putting the mchines together will involve dding vrious new sttes nd trnsitions, nd renumering the sttes nd moves in the constituent mchines. An exmple progrm is m or :: nf num -> nf num -> nf num m or (NFA sttes1 moves1 strt1 finish1) (NFA sttes2 moves2 strt2 finish2) = NFA (sttes1 $union sttes2 $union newsttes) (moves1 $union moves2 $union newmoves) 0 (sing (m1+m2+1)) where m1 = crd sttes1 m2 = crd sttes2 sttes1 = mpset (renumer 1) sttes1 sttes2 = mpset (renumer (m1+1)) sttes2 newsttes = mkeset [0,(m1+m2+1)] moves1 = mpset (renumer move 1) moves1 moves2 = mpset (renumer move (m1+1)) moves2 newmoves = mkeset [ Emove 0 1, Emove 0 (m1+1), Emove m1 (m1+m2+1), Emove (m1+m2) (m1+m2+1) ] The function renumer renumers sttes nd renumer move moves. 9. Deterministic mchines A deterministic finite utomton is n NFA which contins no "-moves, nd hs t most one rrow lelled with prticulr symol leving ny given stte. The effect of this is to mke opertion of the mchine deterministic t ny stge there is t most one possile move to mke, nd so fter reding sequence of chrcters, the

21 21 mchine cn e in one stte t most. Implementing mchine of this sort is much simpler thn for n generl NFA: we only hve to keep trck of single position. Is there generl mechnism for finding DFA corresponding to regulr expression? In fct, there is generl technique for trnsforming n ritrry NFA into DFA, nd this we exmine now. The conversion of n NFA into DFA is sed on the implementtion given in Section 6. The min ide there is to keep trck of set of sttes, representing ll the possile positions fter reding certin mount of input. This set itself cn e thought of s stte of nother mchine, which will e deterministic: the moves from one set to nother re completely deterministic. We show how the conversion works with the mchine P. The strt stte of the mchine will e the closure of the set f0g, tht is A = f0,1,2,4g Now, the construction proceeds y finding the sets ccessile from A y moves on nd on ll the chrcters in the lphet of the mchine P. These sets re sttes of the new mchine; we then repet the construction with these new sttes, until no more sttes re produced y the construction. From A on the symol we cn move to from 2. Closing under "-moves we hve the set f1,2,,4,6,7g, which we cll B B = f1,2,,4,6,7g A ;! B In similr wy, from A on we hve C = f1,2,4,5,6,7g A ;! C Our new mchine so fr looks like B A C We now hve to see wht is ccessile from B nd C. First B. D = f1,2,,4,6,7,8g B ;! D

22 22 which is nother new stte. The process of generting new sttes must stop, s there is only finite numer of sets of sttes to choose from f0,1,2,,4,5,6,7,8g. Wht hppens with move from B? B ;! C This gives the prtil mchine B A D C Similrly, C ;! D C ;! C D ;! D D ;! C which completes the construction of the DFA A B D C Which of the new sttes is finl? One of these sets represents n ccepting stte exctly when it contins finl stte of the originl mchine. For P this is 8, which is contined in the set D only. In generl there cn e more thn one ccepting stte for mchine. (This need not e true for NFAs, since we cn lwys dd new finl stte to which ech of the originls is linked y n "-move.)

23 2 10. Trnsforming NFAs to DFAs The Mirnd code to covert n NFA to DFA is found in the file nf to df.m, nd the min function is mke deterministic :: nf num -> nf num mke deterministic = numer. mke deter A deterministic version of n NFA with numeric sttes is defined in two stges, using mke deter :: nf num -> nf (set num) numer :: nf (set num) -> nf num mke deter does the conversion to the deterministic utomton with sets of numers s sttes, numer replces sets of numers y numers (rther thn cpitl letters, s ws done ove). Sttes re replced y their position in list of sttes see the file for more detils. mke deter is specil cse of the function deterministic :: nf num -> [chr] -> nf (set num) mke deter mch = deterministic mch (lphet mch) The process of dding stte sets is repeted until no more sets re dded. This is version of tking limit, given y the nf limit function, which cts s the usul limit function, except tht it checks for equlity of NFAs s collections of sets. deterministic mch lph = nf limit (ddstep mch lph) strtmch where strtmch = NFA (sing strter) empty strter finish

24 24 strter = closure mch (sing strt) finish = empty, if eqset (term $inter strter) empty = sing strter, otherwise (NFA sts mvs strt term) = mch The strt mchine, strtmch, consists of single stte, the "-closure of the strt stte of the originl mchine. ddstep mch lph tkes prtilly uilt DFA nd dds the stte sets of mch ccessile y single move on ny of the chrcters in lph, the lphet of mch. ddstep :: nf num -> [chr] -> nf (set num) -> nf (set num) ddstep mch lph df = dd ux mch lph df (fltten sttes) where (NFA sttes m s f) = df dd ux mch lph df [] = df dd ux mch lph df (st:rest) = dd ux mch lph (ddmoves mch st lph df) rest This involves iterting over the stte sets in the prtilly uilt DFA, which is done using ddmoves. ddmoves mch x lph df will dd to df ll the moves from stte set x over the lphet lph. ddmoves :: nf num -> set num -> [chr] -> nf (set num) -> nf (set num) ddmoves mch x [] df = df ddmoves mch x (c:r) df = ddmoves mch x r (ddmove mch x c df) In turn, ddmoves itertes long the lphet, using ddmove. ddmove mch x c df will dd to df the moves from stte set x on chrcter c. ddmove :: nf num -> set num -> chr

25 25 -> nf (set num) -> nf (set num) ddmove mch x c (NFA sttes moves strt finish) = NFA sttes moves strt finish where sttes = sttes $union (sing new) moves = moves $union (sing (Move x c new)) finish = finish $union (sing new), if (eqset empty (term $inter new)) = finish, otherwise new = onetrns mch c x (NFA s m q term) = mch The new stte set dded y ddmove is defined using the onetrns function first defined in the simultion of the NFA. 11. Minimising DFA In uilding DFA, we hve produced mchine which cm e implemented more efficiently. We might, however, hve more sttes in the DFA thn necessry. This section shows how we cn optimise DFA so tht it contins the minimum numer of sttes to perform its function of recognising the strings mtching prticulr regulr expression. Two sttes m nd n in DFA re distinguishle if we cn find string st which reches n ccepting stte from n ut not from n (or vice vers). Otherwise, they cn e treted s the sme, ecuse no string mkes them ehve differently putting it different wy, no experiment mkes the two different. How cn we tell when two sttes re different? We strt y dividing the sttes into two prtitions: one contins the ccepting sttes, nd the other the reminder, or non-ccepting sttes. For our exmple, we get the prtition I: D II: A,B,C Now, for ech set in the prtition, we check whether the elements in the set cn e further divided. We look t how ech of the sttes in the set ehves reltive to the previous prtition. In pictures,

26 26 B (II) D (I) D (I) A B C C (II) C (II) C (II) This mens tht we cn re-prtition thus: I: D II: A III: B,C We now repet the process, nd exmine the only set which might e further sudivided, giving D (I) D (I) B C C (III) C (III) This shows tht we don t hve to re-prtition ny further, nd so tht we cn stop now, nd collpse the two sttes B nd C into one, thus: A The Mirnd implementtion of this process is in the file minimise df.m. Exercises 1. For the regulr expression ( )*, find the corresponding NFA. 14. For the NFA of question 1, find the corresponding (non-optimised) DFA. 15. For the DFA of question 2, find the optimised DFA. BC D

27 Regulr definitions A regulr definition consists of numer of nmed regulr expressions. We re llowed to use the defined nmes on the right-hnd sides of definitions fter the definition of the nme. For exmple, lph -> [-za-z] digit -> [0-9] lphnum -> lph digit ident -> lph lphnum* digits -> digit+ frct -> (.digits)? num -> digits frct Becuse of the stipultion tht definition precedes the use of nme, we cn expnd ech right-hnd side to regulr expression involving no nmes. We cn uild mchines to recognise strings from numer of regulr expressions. Suppose we hve the ptterns p1: p2: p: ** We cn uild the three NFAs thus: nd then they cn e joined into single mchine, thus

28 p p2 7 8 p In using the mchine we look for the longest mtch ginst ny of the ptterns: 0,1,,7,8(p) 2(p1),4,7,8(p) 7,8(p) 8(p) - In the exmple, the segment of mtches the pttern p. Exercises 16. Fully expnd the nmes digits nd num given ove. 17. Build Mirnd progrm to recognise strings ccording to set of regulr definitions, s outlined in this section. Biliogrphy [Aho et. l.] Aho, A.V., Sethi, R. nd Ullmn, J.D., Compilers: Principles, Techniques nd Tools, Addison-Wesley, Reding, MA, USA, [Thompson] Thompson, S., Mirnd The Crft of Functionl Progrmming, Addison- Wesley, Wokinghm, Berks, UK, 1995.

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015 Finite Automt Lecture 4 Sections 3.6-3.7 Ro T. Koether Hmpden-Sydney College Wed, Jn 21, 2015 Ro T. Koether (Hmpden-Sydney College) Finite Automt Wed, Jn 21, 2015 1 / 23 1 Nondeterministic Finite Automt

More information

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2014 Dec 11 th /13 th Finl Exm Nme: Note: in ll questions, the specil symol ɛ (epsilon) is used to indicte the empty string. Question 1. [5 points] Consider the following regulr expression;

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

TO REGULAR EXPRESSIONS

TO REGULAR EXPRESSIONS Suject :- Computer Science Course Nme :- Theory Of Computtion DA TO REGULAR EXPRESSIONS Report Sumitted y:- Ajy Singh Meen 07000505 jysmeen@cse.iit.c.in BASIC DEINITIONS DA:- A finite stte mchine where

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Scanner Termination. Multi Character Lookahead

Scanner Termination. Multi Character Lookahead If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

Lexical analysis, scanners. Construction of a scanner

Lexical analysis, scanners. Construction of a scanner Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program: Lexicl Anlysis Red source progrm nd produce list of tokens ( liner nlysis) source progrm The lexicl structure is specified using regulr expressions Other secondry tsks: (1) get rid of white spces (e.g.,

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 3: Lexer genertors Viktor Leijon Slides lrgely y John Nordlnder with mteril generously provided y Mrk P. Jones. 1 Recp: Hndwritten Lexers: Don t require sophisticted

More information

Theory of Computation CSE 105

Theory of Computation CSE 105 $ $ $ Theory of Computtion CSE 105 Regulr Lnguges Study Guide nd Homework I Homework I: Solutions to the following problems should be turned in clss on July 1, 1999. Instructions: Write your nswers clerly

More information

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1 Deterministic Finite Automt And Regulr Lnguges Fll 2018 Costs Busch - RPI 1 Deterministic Finite Automton (DFA) Input Tpe String Finite Automton Output Accept or Reject Fll 2018 Costs Busch - RPI 2 Trnsition

More information

CSE 401 Midterm Exam 11/5/10 Sample Solution

CSE 401 Midterm Exam 11/5/10 Sample Solution Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed

More information

CMPSC 470: Compiler Construction

CMPSC 470: Compiler Construction CMPSC 47: Compiler Construction Plese complete the following: Midterm (Type A) Nme Instruction: Mke sure you hve ll pges including this cover nd lnk pge t the end. Answer ech question in the spce provided.

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

Assignment 4. Due 09/18/17

Assignment 4. Due 09/18/17 Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples COMPUTER SCIENCE 123 Foundtions of Computer Science 6. Tuples Summry: This lecture introduces tuples in Hskell. Reference: Thompson Sections 5.1 2 R.L. While, 2000 3 Tuples Most dt comes with structure

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex Long Quiz2 45mins Nme: Personl Numer: Prolem. (20pts) Here is n Tle of Perl Regulr Ex Chrcter Description. single chrcter \s whitespce chrcter (spce, t, newline) \S non-whitespce chrcter \d digit (0-9)

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Progrmming Lnguges h"p://www.di.unipi.it/~ndre/did2c/plp- 14/ Prof. Andre Corrdini Deprtment of Computer Science, Pis Lesson 5! Gener;on of Lexicl Anlyzers Creting Lexicl Anlyzer with Lex

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night Should e done L hours nd Office hours Sign up for the miling list t, strting to send importnt info to list http://groups.google.com/group/cs453-spring-2011 Red Ch 1 nd skim Ch 2 through 2.6, red 3.3 nd

More information

CS 321 Programming Languages and Compilers. Bottom Up Parsing

CS 321 Programming Languages and Compilers. Bottom Up Parsing CS 321 Progrmming nguges nd Compilers Bottom Up Prsing Bottom-up Prsing: Shift-reduce prsing Grmmr H: fi ; fi b Input: ;;b hs prse tree ; ; b 2 Dt for Shift-reduce Prser Input string: sequence of tokens

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS COMPUTATION & LOGIC Sturdy st April 7 : to : INSTRUCTIONS TO CANDIDATES This is tke-home exercise. It will not

More information

Midterm I Solutions CS164, Spring 2006

Midterm I Solutions CS164, Spring 2006 Midterm I Solutions CS164, Spring 2006 Februry 23, 2006 Plese red ll instructions (including these) crefully. Write your nme, login, SID, nd circle the section time. There re 8 pges in this exm nd 4 questions,

More information

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers? 1.1 TEXAS ESSENTIAL KNOWLEDGE AND SKILLS Prepring for 2A.6.K, 2A.7.I Intervl Nottion nd Set Nottion Essentil Question When is it convenient to use set-uilder nottion to represent set of numers? A collection

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques Brin Kernighn wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Junwen Li, li@cs, CS 217,258-0451 Yong Wng,yongwng@cs, CS

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22) Homework Context Free Lnguges III Prse Trees nd Homework #5 (due 10/22) From textbook 6.4,b 6.5b 6.9b,c 6.13 6.22 Pln for tody Context Free Lnguges Next clss of lnguges in our quest! Lnguges Recll. Wht

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2016 Sep 29th Exm 1 Nme: Note: in ll questions, the speil symol ɛ (epsilon) is used to indite the empty string. Question 1. [10 points] Speify regulr expression tht genertes the lnguge over

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

CMSC 331 First Midterm Exam

CMSC 331 First Midterm Exam 0 00/ 1 20/ 2 05/ 3 15/ 4 15/ 5 15/ 6 20/ 7 30/ 8 30/ 150/ 331 First Midterm Exm 7 October 2003 CMC 331 First Midterm Exm Nme: mple Answers tudent ID#: You will hve seventy-five (75) minutes to complete

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Chapter44. Polygons and solids. Contents: A Polygons B Triangles C Quadrilaterals D Solids E Constructing solids

Chapter44. Polygons and solids. Contents: A Polygons B Triangles C Quadrilaterals D Solids E Constructing solids Chpter44 Polygons nd solids Contents: A Polygons B Tringles C Qudrilterls D Solids E Constructing solids 74 POLYGONS AND SOLIDS (Chpter 4) Opening prolem Things to think out: c Wht different shpes cn you

More information

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup Regulr Expression Mtching with Multi-Strings nd Intervls Philip Bille Mikkel Thorup Outline Definition Applictions Previous work Two new problems: Multi-strings nd chrcter clss intervls Algorithms Thompson

More information

1 Quad-Edge Construction Operators

1 Quad-Edge Construction Operators CS48: Computer Grphics Hndout # Geometric Modeling Originl Hndout #5 Stnford University Tuesdy, 8 December 99 Originl Lecture #5: 9 November 99 Topics: Mnipultions with Qud-Edge Dt Structures Scribe: Mike

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Misrepresentation of Preferences

Misrepresentation of Preferences Misrepresenttion of Preferences Gicomo Bonnno Deprtment of Economics, University of Cliforni, Dvis, USA gfbonnno@ucdvis.edu Socil choice functions Arrow s theorem sys tht it is not possible to extrct from

More information

Lexical Analysis and Lexical Analyzer Generators

Lexical Analysis and Lexical Analyzer Generators 1 Lexicl Anlysis nd Lexicl Anlyzer Genertors Chpter 3 COP5621 Compiler Construction Copyright Roert vn Engelen, Florid Stte University, 2007-2009 2 The Reson Why Lexicl Anlysis is Seprte Phse Simplifies

More information