ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy
Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop è < <= = <> > >= id è letter(letter digit)* num è digit+(.digit+)?(e(+ -)?digit+)? Trim whitespce delim è lnk t newline ws è delim+
Trnsition Digrm Διάγραμμα Μτάβασης Intermedite visul representtion The grph depicts how the pointer moves from chrcter to chrcter Circles re clled sttes They represent the pointer s positions Edges leving stte s hve lels indicting the chrcters required for moving to the next stte Other is specil (refers to ny chrcter tht is not indicted y ny of the other edges leving s) strt > = 0 6 7 * denotes sttes on which input retrction must tke plce (i.e., the pointer is moved to nother trnsition digrm). other Trnsition digrm for >= 8 *
Trnsition Digrm reltion opertors strt < = 0 1 2 return(relop, LE) > other 3 return(relop, NE) = 4 * return(relop, LT) EQ: equl LE: less or equl LT: less thn NE: not equl GE: greter or equl GT: grter thn > 5 6 return(relop, EQ) = 7 other 8 * return(relop, GE) return(relop, GT)
Keywords nd Identifiers Keywords is specil cse of identifiers Once n identifier is recognized we cn check if it is keyword letter or digit strt letter other 0 10 11 * return(get_token(), instll_id())
Unsigned numers digit digit digit other strt digit. digit E + or - digit * 12 13 14 15 16 17 18 19 E digit digit digit Recognizes 12.3E4 (digits frction? exponent?) strt digit 20 21 digit. digit 22 23 other * 24 Recognizes 12.3 (digits frction) strt digit 25 26 other * 27 Recognizes 12 (digits)
Finite Automt Ππρασμένα Αυτόματα Recognizer for lnguge A progrm tht tkes s input string x nd nswers yes if x is sentence of the lnguge nd no otherwise. Compile regulr expressions to recognizers Construct generlized trnsition digrm clled finite utomton Two clsses of finite utomt Deterministic, DFA (νττρμινιστικό) Non-deterministic, NFA (μη-νττρμινιστικό)
DFAs nd NFAs Both DFA nd n NFA re cple of recognizing precisely the regulr sets Time-spce trde-off DFAs implement fster recognizers DFAs re igger (more sttes, more memory) Regulr expressions cn e compiled in oth DFA nd n NFA
NFA Mthemticl model tht consists of 1. set of sttes S 2. set of input symols Σ (the input symol lphet) 3. trnsition functions move tht mps sttesymol pirs to sets of sttes 4. stte s 0 tht is distinguished s the strt (or initil) stte 5. set of sttes F distinguished s ccepting (or finl) sttes
NFA for ( )* Sttes: {0, 1, 2, 3} Symol lphet: {, } Strt stte: 0 Accepting stte: 3 strt 0 1 2 3 An NFA looks like trnsition digrm, ut the sme chrcter cn lel two or more trnsitions out of one stte: Exmple: cn trnsit control: from Stte 0 to Stte 0 from Stte 0 to Stte 1 Also: edges cn e lel y the specil symol
Implementtion using Trnsition Tle STATE INPUT SYMBOL 0 {0, 1} {0} 1 - {2} 2 - {3} If I m in stte 0 nd the input chrcter is, then I cn move to sttes 0 or 1 If I m in stte 0 nd the input chrcter is, then I cn move to stte 0 If I m in stte 1 nd the input chrcter is, then there is no stte to move If I m in stte 1 nd the input chrcter is, then I cn move to stte 2 strt 0 1 2 3
Accepted input strings ( )* Accepted input strings:,,,, 0 0 1 2 3 Severl other sequences of moves my e mde on the input string, ut none of the others hppened to end in n ccepting stte: 0 0 0 0 0 strt 0 1 2 3
NFA for * * 1 2 strt 0 3 4
DFA 1. no stte hs n -trnsition, i.e., trnsition on input, 2. For ech stte s nd input symol, there is t most one edge leled leving s 0 1 You cn t hve leving stte 0 nd eing le to rech two sttes, i.e., stte 0 nd stte 1
DFA for ( )* strt 0 1 2 3 Recll the NFA version: strt 0 1 2 3
DFA is esy to code s := s 0 c := nextchr while c!= eof do s := move(s, c) c := nextchr end if s in F then return yes else return no
Wht do we do? NFAs re esy to conceive nd drw Multiple edges on the sme chrcters leving one stte cn cuse miguity (αμφισημιά) Mny pths tht spell out the sme input string Hrd to code DFAs re esy to implement in computer progrm
Suset Construction CONVERSION OF AN NFA INTO A DFA
Opertions OPERATION -closure(s) DESCRIPTION Set of NFA sttes rechle from NFA stte s on -trnsitions lone. -closure(t) Set of NFA sttes rechle from some NFA stte s in T on - trnsitions lone. move(t, ) Set of NFA sttes to which there is trnsition on input symol from some NFA stte s in T. Nottion: s n NFA stte, T set of NFA sttes
Exmples move({1, 2}, ) = 2 1 2 strt 0 3 4 -closure(0) = {0, 1, 2, 3} -closure(1) = {1, 2} -closure(2) = {2} -closure(3) = {3} -closure(4) = {4}
Exmple Initil NFA, for ( )* strt 2 3 0 1 6 7 8 9 10 4 5
Equivlent DFA C strt A B No trnsitions No two edges with the sme symol leving one stte Esy to trnsform to computer progrm D E
Step 1 The strt stte of the equivlent DFA is -closure(0) A = {0, 1, 2, 4, 7}, these re exctly the sttes rechle from stte 0 vi pth in which every edge is leled
Step 2 The input symol is {, }, we mrk A, nd compute -closure(move(a, )) move(a, ) is the set of sttes of the NFA hving trnsitions on from memers of A, tht is sttes 2 nd 7 (moving to 3 nd 8) -closure(move({0, 1, 2, 4, 7}, )) = -closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} This is B = {1, 2, 3, 4, 6, 7, 8}
Step 3 Among the sttes in A, only 4 hs trnsition on to 5 the DFA hs trnsition from A to C, nd C = -closure({5}) = {1, 2, 4, 5, 6, 7}
Step 4 We mrk the new sets B nd C, nd we repet Step 1-3
Repet steps Until ll sets of the DFA re mrked Finl sets A = {0, 1, 2, 4, 7} B = {1, 2, 3, 4, 6, 7, 8} C = {1, 2, 4, 5, 6, 7} D = {1, 2, 4, 5, 6, 7, 9} E = {1, 2, 3, 5, 6, 7, 10}
Trnsition Tle for DFA STATE INPUT SYMBOL A B C B B D C B C D B E E B C
strt NFA 2 3 0 1 6 7 8 9 10 4 5 C DFA strt A B D E
The suset construction initilly, -closure(s0) is the only stte in Dsttes nd it is unmrked; while there is n unmrked stte T in Dsttes do egin mrk T for ech input symol do egin U = -closure(move(t,)) if U is not in Dsttes then dd U s n unmrked stte to Dsttes; Dtrn(T,) := U end for end while
-closure(t) push ll sttes in T onto stck initilize -closure(t) to T; while stck is not empty do egin pop t for ech stte u with n edge from t to u leled do if u not in -closure(t) end if end for end while dd u to -closure(t) push u