Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte Mchines DFAs: Deterministic Finite Automt Complictions NFAs: Non Deterministic Finite Stte Automt From Regulr Expressions to NFAs From NFAs to DFAs Structure of Typicl Compiler Anlysis chrcter strem lexicl nlysis tokens words syntctic nlysis AST sentences semntic nlysis nnotted AST interpreter Synthesis IR code genertion IR optimiztion IR code genertion trget lnguge CS453 Lecture Regulr Expressions nd Trnsition Digrms 1 CS453 Lecture Regulr Expressions nd Trnsition Digrms 2 Exmple MeggyJv progrm import meggy.meggy; clss PA3Flower { pulic sttic void min(string[] whtever){ { // Upper left petl, clockwise Meggy.setPixel( (yte)2, (yte)4, Meggy.Color.VIOLET ); Meggy.setPixel( (yte)2, (yte)1, Meggy.Color.VIOLET); } } Atmel ssemly for Meggy.setPixel() cll min: cll _Z18MeggyJrSimpleSetupv # Push constnt 2 onto stck ldi r24, 2 push r24 # Push constnt 4 onto stck ldi r24, 4 push r24 # Push Meggy.Color.VIOLET onto the stck. ldi r22, 6 push r22 # Pop the rguments into registers in reverse order. pop r20 pop r22 pop r24 cll _Z6DrwPxhhh cll _Z12DisplySltev CS453 Lecture Regulr Expressions nd Trnsition Digrms 3 CS453 Lecture Regulr Expressions nd Trnsition Digrms 4
Aout The Slides on Lnguges nd Finite Automt Slides Originlly Developed y Prof. Costs Busch (2004) Mny thnks to Prof. Busch for developing the originl slide set. Adpted with permission y Prof. Dn Mssey (Spring 2007) Susequent modifictions, mny thnks to Prof. Mssey for CS 301 slides Adpted with permission y Prof. Michelle Strout (Spring 2011) Adpted for use in CS 453 Adpted y Wim Bohm( dded regulr expr! NFA! DFA, Spr2012) Lnguges A lnguge is set of strings (sometimes clled sentences) String: A finite sequence of letters Exmples: ct, dog, house, Defined over fixed lphet: Σ = {,, c,, z} Empty String Regulr Expressions A string with no letters: (sometimes λ is used) Regulr expressions descrie regulr lnguges You hve proly seen them in OSs / editors Oservtions: = 0 Exmple: ( ()(c)) * w = w = w = = descries the lnguge L(( ()(c))*) = {,c,,c,c,... }
Recursive Definition for Specifying Regulr Expressions Primitive regulr expressions: where α Σ, some lphet Given regulr expressions r 1 nd r 1 r 2 (r 1 )(r 2 ) r 1 * ( r 1 ),, α r 2 Are regulr expressions Regulr opertors choice: A B string from L(A) or from L(B) conctention: A B string from L(A) followed y string from L(B) repetition: A* 0 or more conctentions of strings A + grouping: ( A ) from L(A) 1 or more Conctention hs precedence over choice: A B C vs. (A B)C More syntctic sugr, used in scnner genertors: [c] mens or or c [\t\n ] mens t, newline, or spce [-z] mens,c,, or z CS453 Lecture Regulr Expressions nd Trnsition Digrms 10 Exmple Regulr Expressions nd Regulr Definitions Regulr definition: nme : regulr expression nme cn then e used in other regulr expressions Keywords print, while Finite Automton Input String Opertions: +, -, * Identifiers: let : [-za-z] // chose from to z or A to Z dig : [] id : let (let dig)* Finite Automton Output String Numers: dig + = dig dig* CS453 Lecture Regulr Expressions nd Trnsition Digrms 11
Finite Accepter Input String Finite Automton Output Accept or Reject Stte Trnsition Grph -Finite Accepter initil stte stte trnsition finl stte ccept Initil Configurtion Input String Reding the Input q 0
Input finished Output: ccept
String Rejection
Input finished The Empty String Output: reject q 0 Another Exmple q 0, Output: reject Would it e possile to ccept the empty string?
,, Input finished Output: ccept,,
Rejection,,,,
Input finished Which strings re ccepted?, Output: reject Formlities Q Deterministic Finite Accepter (DFA) Σ δ q 0 F ( Q Σ, δ, q F) M =, : set of sttes : input lphet, 0 : trnsition function : initil stte : set of finl (ccepting) sttes Input Alphet Σ Set of Sttes Q Σ = { } Q = { q, q, q, q, q q } 0 1 2 3 4, 5
Initil Stte Set of Finl Sttes F = { } q 4 F q 0 q 4 Trnsition Function δ : Q Σ Q δ δ ( q ) = 1 0, q q 1
δ ( q 0, ) = δ ( q 2, ) = q 3 Trnsition Function / tle δ q q 0 q 1 q 2 q 3 5 q5 q5 q5 δ Complictions 1. "1234" is n NUMBER ut wht out the 123 in 1234 or the 23, etc. Also, the scnner must recognize mny tokens, not one, only stopping t end of file. 2. "if" is keyword or reserved word IF, ut "if" is lso defined y the reg. exp. for identifier ID, we wnt to recognize IF. 3. we wnt to discrd white spce nd comments. q 4 4. "123" is NUMBER ut so is "235" nd so is "0", just s "" is n ID nd so is "cd, we wnt to recognize token, ut dd ttriutes to it. CS453 Lecture Regulr Expressions nd Trnsition Digrms 48
Compliction 1 1. "1234" is n NUMBER ut wht out the 123 in 1234 or the 23, etc. Also, the scnner must recognize mny tokens, not one, only stopping t end of file. So: recognize the lrgest string defined y some regulr expression, only stop getting more input if there is no more mtch. This introduces the need to reconsider chrcter, s it is the first of the next token e.g. fnme(cd ); would e scnned s ID OPEN ID COMMA ID CLOSE SEMI EOF scnning fnme would consume (, which would e put ck nd then recognized s OPEN Compliction 2 2. "if" is keyword or reserved word IF, ut "if" is lso defined y the reg. exp. for identifier ID, we wnt to recognize IF, so Hve some wy of determining which token ( IF or ID ) is recognized. This cn e done using priority, e.g. in scnner genertors n erlier definition hs higher priority thn lter one. By putting the definition for IF efore the definition for ID in the input for the scnner genertor, we get the desired result. CS453 Lecture Regulr Expressions nd Trnsition Digrms 49 CS453 Lecture Regulr Expressions nd Trnsition Digrms 50 Compliction 3 Compliction 4 3. we wnt to discrd white spce nd comments nd not other the prser with these. So: in scnner genertors, we cn specify, using regulr expression, white spce e.g. [\t\n ] nd return no token, i.e. move to the next specify comments using (NASTY) regulr expression nd gin return no token 4. "123" is NUMBER ut so is "235" nd so is "0", just s "" is n ID nd so is "cd, we wnt to recognize token, ut dd ttriutes to it. So, Scnners return Symols, not tokens. A Symol is (token, tokenvlue) pir, e.g. (NUMBER,123) or (ID,""). Often more informtion is dded to symol, e.g. line numer nd position (s we will do in MeggyJv) CS453 Lecture Regulr Expressions nd Trnsition Digrms 51 CS453 Lecture Regulr Expressions nd Trnsition Digrms 52
(Non) Deterministic Finite Stte Automt From regulr expressions to NFAs A Deterministic Finite Stte Automton (DFA) hs disjoint chrcter sets on its edges, i.e. the choice which stte is next is deterministic. A Non-deterministic Finite Stte Automton (NFA) does NOT, i.e. it cn hve chrcter sets on its edges tht overlp (non empty intersection), nd empty sets on the some edges (leled ). NFAs re used in the trnsltion from regulr expressions to FSAs. E.g. when we comine the reg. exp for IF with the reg.exp for ID y just merging the two Trnsition grphs, we would get n NFA. NFAs re first step in creting DFA for scnner. The NFA is then trnsformed into DFA. regexp simple letter empty string AB conct the NFAs A B split merge them A* uild loop A A ccept stte of the NFA for A B A B CS453 Lecture Regulr Expressions nd Trnsition Digrms 53 CS453 Lecture Regulr Expressions nd Trnsition Digrms 54 The Prolem Exmple IF nd ID DFAs re esy to execute (tle driven interprettion) NFAs re esy to uild from reg. exps, ut hrd to execute we would need some form of guessing, implemented y ck trcking To uild DFA from n NFA we void the ck trck y tking ll choices in the NFA t once, move with chrcter or gets us to set of sttes in the NFA, which will ecome one stte in the DFA. We keep doing this until we hve exhusted ll possiilities. This mechnism is clled trnsitive closure (This ends ecuse there is only finite set of susets of NFA sttes <=? ) let : [-z] dig : [] tok : if id if : i f id : let (let dig)* CS453 Lecture Regulr Expressions nd Trnsition Digrms 55 CS453 Lecture Regulr Expressions nd Trnsition Digrms 56
Exmple: NFA for IF nd ID 1 i f 2 3 IF -z 4 5 8 ID -z 7 6 IF hs priority over ID From 1, with we cn get to sttes 1 nd 4 this is clled n -closure We cn now simulte the ehvior of the NFA nd uild tle for the DFA mking chrcter moves plus -closures let : [-z] dig : [] tok : if id if : i f id : let (let dig)* NFA simultion scnning in 1 i f 2 3 4 IF -z ID -z 7 6 5 8 DFAstte NFAsttes Move Next 1 1,4 i 2,5,8,6 2 2,5,6,8 n 6,7,8 Only one of the sttes in 6,7,8 is n ccepting stte, n ID ccepting stte, so in is n ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 57 CS453 Lecture Regulr Expressions nd Trnsition Digrms 58 NFA simultion scnning if 1 i f 2 3 4 -z -z 7 6 5 8 IF ID DFAstte NFAsttes Move Next 1 1,4 i 2,5,8,6 2 2,5,6,8 f 3,6,7,8 Two of the sttes in 3,6,7,8 re ccepting, n IF ccepting stte (3) nd n ID ccepting stte (8), IF hs priority over ID, so if is n IF Definitions: edge(s,c) nd closure edge(s,c): the set of ll NFA sttes rechle from stte s following n edge with chrcter c closure(s): the set of ll sttes rechle from S with no chrs or closure(s) = T = S ( T=S s T repet T =T; forll s in T { T =T; T = T ' ( edge(s,)) } until T ==T s T ' edge(s,)) CS453 Lecture Regulr Expressions nd Trnsition Digrms 59 This trnsitive closure lgorithm termintes ecuse there is finite numer of sttes in the NFA CS453 Lecture Regulr Expressions nd Trnsition Digrms 60
DFAedge nd NFA Simultion Suppose we re in stte DFA d = {s i, s k,s l } By moving with chrcter c from d we rech set of new NFA sttes, cll these DFAedge(d,c), new or lredy existing DFA stte DFAedge(d, c) = closure( NFA simultion: let the input string e c 1 c k d=closure({s1}) // s 1 the strt stte of the NFA for i from 1 to k d = DFAedge(d,c i ) s d edge(s, c)) Constructing DFA with closure nd DFAEdge stte d 1 = closure(s 1 ) the closure of the strt stte of the NFA mke new sttes y moving from existing sttes with chrcter c, using DFAEdge(d,c); record these in the trnsition tle mke ccepts in the trnsition tle, if there is n ccepting stte in d, decide priority if more thn one ccept stte. Insted of chrcters we use non-overlpping (DFA) chrcter clsses to keep the tle mngele. CS453 Lecture Regulr Expressions nd Trnsition Digrms 61 CS453 Lecture Regulr Expressions nd Trnsition Digrms 62 NFA to DFA (let s uild it) NFA to DFA 1 i f 2 3 4 -z -z 7 6 5 8 IF ID 1 i f 2 3 4 -z -z 7 6 5 8 IF ID 1: 1,4 2: i 2,5,6,8 3: ID 3,6,7,8 -h j-z 5: 5,6,8 ID f -e g-z -z -z IF -z 4: 6,7,8 ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 63 CS453 Lecture Regulr Expressions nd Trnsition Digrms 64
The trnsition tle for IF ID p NFAsttes(p) i f -h -e,g-z -z, ACPT j-z 1 {1,4} {2,5,6,8} {5,6,8} 2 {2,5,6,8} {3,6,7,8} {6,7,8} ID 3 {3,6,7,8} {6,7,8} IF 4 {6,7,8} {6,7,8} ID homework 1 Build n NFA nd DFA for integer nd flot literls dot:. dig: [] int-lit: dig + flot-lit: dig* dot dig+ 5 {5,6,8} {6,7,8} ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 65 CS453 Lecture Regulr Expressions nd Trnsition Digrms 66