ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy
RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è < <= = <> > >= id è letter(letter digit)* num è digit+(.digit+)?(e(+ -)?digit+)? Trim whitespce delim è lnk t newline ws è delim+
TrnsiNon Digrm Διάγραμμα Μτάβασης Intermedite visul representnon The grph depicts how the pointer moves from chrcter to chrcter Circles re clled sttes They represent the pointer s posinons Edges leving stte s hve lels indicnng the chrcters required for moving to the next stte Other is specil (refers to ny chrcter tht is not indicted y ny of the other edges leving s) strt > = 0 6 7 * denotes sttes on which input retrcnon must tke plce (i.e., the pointer is moved ck to the strt of the digrm). other 8 *
TrnsiNon Digrm if expressions strt < = 0 1 2 return(relop, LE) > other 3 return(relop, NE) = 4 * return(relop, LT) EQ: equl LE: less or equl LT: less thn NE: not equl GE: greter or equl GT: grter thn > 5 6 return(relop, EQ) = 7 other 8 * return(relop, GE) return(relop, GT)
Keywords nd IdenNfiers Keywords is specil cse of idennfiers Once n idennfier is recognized we cn check if it is keyword le9er or digit strt le9er other 0 10 11 * return(get_token(), instll_id())
Unsigned numers digit digit digit other strt digit. digit E + or - digit * 12 13 14 15 16 17 18 19 E digit digit digit Recognizes 12.3E4 (digits frc>on? exponent?) strt digit 20 21 digit. digit 22 23 other * 24 Recognizes 12.3 (digits frc>on) strt digit 25 26 other * 27 Recognizes 12 (digits)
Finite Automt Ππρασμένα Αυτόματα Recognizer for lnguge A progrm tht tkes s input string x nd nswers yes if x is sentence of the lnguge nd no otherwise. Compile regulr expressions to recognizers Construct generlized trnsi>on digrm clled finite utomton Two clsses of finite utomt DeterminisNc, DFA (νττρμινιστικό) Non-determinisNc, NFA (μη-νττρμινιστικό)
DFAs nd NFAs Both DFA nd n NFA re cple of recognizing precisely the regulr sets Time-spce trde-off DFAs implement fster recognizers DFAs re igger (more sttes, more memory) Regulr expressions cn e compiled in oth DFA nd n NFA
NFA MthemNcl model tht consists of 1. set of sttes S 2. set of input symols Σ (the input symol lphet) 3. trnsinon funcnons move tht mps sttesymol pirs to sets of sttes 4. stte s 0 tht is disnnguished s the strt (or ininl) stte 5. set of sttes F disnnguished s ccepqng (or finl) sttes
NFA for ( )* Sttes: {0, 1, 2, 3} Symol lphet: {, } Strt stte: 0 AccepNng stte: 3 strt 0 1 2 3 An NFA looks like trnsi>on digrm, ut the sme chrcter cn lel two or more trnsi>ons out of one stte: Exmple: cn trnsit control: from Stte 0 to Stte 0 from Stte 0 to Stte 1 Also: edges cn e lel y the specil symol
ImplementNon using TrnsiNon Tle STATE INPUT SYMBOL 0 {0, 1} {0} 1 - {2} 2 - {3} If I m in stte 0 nd the input chrcter is, then I cn move to sttes 0 or 1 If I m in stte 0 nd the input chrcter is, then I cn move to stte 0 If I m in stte 1 nd the input chrcter is, then there is no stte to move If I m in stte 1 nd the input chrcter is, then I cn move to stte 2 strt 0 1 2 3
Accepted input strings ( )* Accepted input strings:,,,, 0 0 1 2 3 Severl other sequences of moves my e mde on the input string, ut none of the others hppened to end in n ccepnng stte: 0 0 0 0 0 strt 0 1 2 3
NFA for * * 1 2 strt 0 3 4
DFA 1. no stte hs n -trnsinon, i.e., trnsinon on input, 2. For ech stte s nd input symol, there is t most one edge leled leving s 0 1 You cn t hve leving stte 0 nd eing le to rech two sttes, i.e., stte 0 nd stte 1
DFA for ( )* strt 0 1 2 3 Recll the NFA version: strt 0 1 2 3
DFA is esy to code s := s 0 c := nextchr while c!= eof do s := move(s, c) c := nextchr end if s in F then return yes else return no
Wht do we do? NFAs re esy to conceive nd drw MulNple edges on the sme chrcters leving one stte cn cuse miguity (αμφισημιά) Mny pths tht spell out the sme input string Hrd to code DFAs re esy to implement in computer progrm
Suset ConstrucQon CONVERSION OF AN NFA INTO A DFA
OperNons OPERATION -closure(s) DESCRIPTION Set of NFA sttes rechle from NFA stte s on -trnsinons lone. -closure(t) Set of NFA sttes rechle from some NFA stte s in T on - trnsinons lone. move(t, ) Set of NFA sttes to which there is trnsinon on input symol from some NFA stte s in T. Not>on: s n NFA stte, T set of NFA sttes
Exmples move({1, 2}, ) = 2 1 2 strt 0 3 4 -closure(0) = {0, 1, 2, 3} -closure(1) = {1, 2} -closure(2) = {2} -closure(3) = {3} -closure(4) = {4}
The suset construcnon initilly, -closure(s0) is the only stte in Dsttes nd it is unmrked; while there is n unmrked stte T in Dsttes do egin mrk T for ech input symol do egin U = -closure(move(t,)) if U is not in Dsttes then dd U s n unmrked stte to Dsttes; Dtrn(T,) := U end for end while
-closure(t) push ll sttes in T onto stcki initilize -closure(t) to T; while stck is not empty do egin pop t for ech stte u with n edge from t to u leled do if u not in -closure(t) end if end for end while dd u to -closure(t) push u
Exmple IniNl NFA, for ( )* strt 2 3 0 1 6 7 8 9 10 4 5
Equivlent DFA C strt A B No trnsinons No two edges with the sme symol leving one stte Esy to trnsform to computer progrm D E
Step 1 The strt stte of the equivlent DFA is -closure(0) A = {0, 1, 2, 4, 7}, these re exctly the sttes rechle from stte 0 vi pth in which every edge is leled
Step 2 The input symol is {, }, we mrk A, nd compute -closure(move(a, )) move(a, ) is the set of sttes of the NFA hving trnsinons on from memers of A, tht is sttes 2 nd 7 (moving to 3 nd 8) -closure(move({0, 1, 2, 4, 7}, )) = -closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} This is B = {1, 2, 3, 4, 6, 7, 8}
Step 3 Among the sttes in A, only 4 hs trnsinon on to 5 the DFA hs trnsinon from A to C, nd C = -closure({5}) = {1, 2, 4, 5, 6, 7}
Step 4 We mrk the new sets B nd C, nd we repet Step 1-3
Repet steps UnNl ll sets of the DFA re mrked Finl sets A = {0, 1, 2, 4, 7} B = {1, 2, 3, 4, 6, 7, 8} C = {1, 2, 4, 5, 6, 7} D = {1, 2, 4, 5, 6, 7, 9} E = {1, 2, 3, 5, 6, 7, 10}
TrnsiNon Tle for DFA STATE INPUT SYMBOL A B C B B D C B C D B E E B C
strt NFA 2 3 0 1 6 7 8 9 10 4 5 C DFA strt A B D E