Scanning Theory and Practice

Size: px
Start display at page:

Download "Scanning Theory and Practice"

Transcription

1 CHAPTER 3 Scnning Theory nd Prctice 3.1 Overview The primry function of scnner is to red in chrcters from source file nd group them into tokens. A scnner is sometimes clled lexicl nlyzer or lexer. The nmes scnner, lexicl nlyzer nd lexer re used interchngebly. The c scnner we sw in Chpter 2 ws quite simple nd could esily be coded by ny competent progrmmer. We will now develop thorough nd systemtic pproch to scnning tht will llow us to crete scnners for complete progrmming lnguges. We will introduce forml nottions for specifying the precise structure of tokens. At first glnce this my seem n unnecessry compliction, given the simple token structure found in most progrmming lnguges. However token structure cn be more detiled nd subtle thn one might expect. For exmple, we re ll fmilir with simple quoted strings in C, C++ or Jv. The body of string cn be ny sequence of chrcters except quote chrcter (which must be escped). But is this simple definition relly correct? Cn newline chrcter pper in string? In C, not unless it is escped with bckslsh. This is to void runwy string which, lcking closing quote, mtches chrcters intended to be prt of other tokens. While C, C++ nd Jv llow escped newlines in strings, Pscl is even stricter nd forbids them entirely. Ad goes further still nd forbids ll unprintble chrcters (precisely becuse they re normlly unredble). Similrly, re null strings (of length zero) llowed? In C, C++, Jv nd Ad they re, but Pscl forbids them. In Pscl string is pcked rry of chrcters, nd zero length rrys re disllowed. A precise definition of tokens is obviously necessry to ensure tht lexicl rules re clerly stted nd properly enforced. Forml definitions lso llow lnguge designer to nticipte design flws. For exmple, virtully ll lnguges llow fixed deciml numbers, such s 0.1 nd But should.1 or 10. be llowed? In C, C++ nd Jv they re llowed, but in Pscl nd Ad they re not nd for n interesting reson. Scnners normlly seek to mtch s mny chrcters s possible so tht, for exmple, ABC is scnned s one identifier rther thn three. But now consider the chrcter sequence In Pscl nd Ad, we wish this to be interpreted s rnge specifier (1 to 10). If we were creless in our token definitions, we might well scn s two rel literls, 1. nd.10, which would led to n immedite (nd unexpected) syntx error. (The fct tht two rels cnnot be djcent is reflected in the context-free grmmr, which is enforced by the prser, not the scnner.) Given forml specifiction of token nd progrm structure, it is possible to exmine lnguge for design flws. For exmple, we could nlyze ll pirs of tokens tht cn be djcent nd determine CRAFTING A COMPILER, 2ND EDITION 1

2 Scnning Theory nd Prctice whether the ctention of the two might be incorrectly scnned. If so, seprtor is required (s is the cse for djcent identifiers nd reserved words), or the lexicl or progrm syntx might need to be redesigned. The point is tht lnguge design is fr more involved thn one might expect, nd forml specifictions llow flws to be discovered before the design is completed. All scnners, independent of the tokens to be recognized, perform much the sme function. Thus writing scnner from scrtch mens reimplementing components common to ll scnners, significnt dupliction of effort. The gol of scnner genertor is to limit the effort in building scnner to specifying which tokens the scnner is to recognize. Using forml nottion, we tell the scnner genertor wht tokens we wnt recognized; it is the genertor s responsibility to produce scnner tht meets our specifiction. Some genertors do not produce n entire scnner; rther, they produce tbles tht cn be used with stndrd driver progrm. The combintion of generted tbles nd stndrd driver yields the desired custom scnner. Progrmming scnner genertor is n exmple of declrtive progrmming. Tht is, unlike ordinry progrmming, which we cll procedurl, we do not tell scnner genertor how to scn but simply wht we wnt scnned. This is higher-level pproch nd in mny wys more nturl one. Much recent reserch in computer science is directed towrd declrtive progrmming styles. (Dtbse query lnguges nd Prolog, logic progrmming lnguge, re declrtive.) Declrtive progrmming is most successful in limited domins, such s scnning, where the rnge of implementtion decisions tht must be utomticlly mde is limited. Nonetheless, long-stnding (nd s yet unrelized) gol of computer scientists is to utomticlly generte n entire production-qulity compiler from specifiction of the properties of the source lnguge nd trget computer. Though our primry focus in this text is on producing correct compilers, performnce is sometimes rel concern, especilly in widely-used production compilers. Surprisingly, even though scnners perform simple tsk, if poorly implemented, they cn be significnt performnce bottlenecks. The reson is tht scnners must wde through the text of progrm chrcter by chrcter. Assume we wish to implement very fst compiler tht cn compile progrm in few seconds. Let s use 30,000 lines minute (500 lines second) s our gol. (Compiler such s Turbo C++ chieve such speeds.) If n verge line contins 20 chrcters, we must scn 10,000 chrcters per second. On 10 MIPS processor (10,000,000 instructions executed per second), even if we did nothing but scnning, we d hve only 1000 instructions per input chrcter to spend. Since scnning isn t the only thing compiler does, 250 instructions per chrcter is more relistic. This is rther tight budget given tht even simple ssignment tkes severl instructions on typicl processor. Though multi-mips processors re common these dys nd 30,000 lines per minute is n mbitious speed, it s cler tht poorly coded scnner cn drmticlly impct the performnce of compiler. In Section 3.2 we introduce declrtive regulr expression nottion tht is well suited to the forml definition of tokens. In Section 3.3, the correspondence between regulr expressions nd finite utomt will be studied. Finite utomt re especilly useful becuse they re procedurl in nture nd cn be directly executed to red chrcters nd group them into tokens. A well-known scnner genertor, Lex, will be considered in some detil in Section 3.4. A few lterntives will lso be considered. Lex tkes token definitions (in declrtive form regulr expressions) nd produces complete scnner subprogrm, redy to be compiled nd executed. Our next topic of discussion, in Section 3.5, is the prcticl considertions needed to build scnner nd integrte it with the rest of the compiler. These considertions include nticipting the tokens nd contexts tht my complicte scnning, voiding performnce bottlenecks, nd recovering from lexicl errors. We conclude the chpter with Section 3.6 tht explins how scnner genertors, like Lex, trnslte regulr expressions into finite utomt nd how finite utomt my be converted to equivlent regulr expressions. Reders who wish to view scnner genertor s simply blck box my skip this section. However, the mteril does serve to reinforce the concepts of regulr expressions nd finite 2 CRAFTING A COMPILER, 2ND EDITION

3 Regulr Expressions utomt introduced erlier. The section lso illustrtes how finite utomt cn be built, merged, simplified, nd even optimized. 3.2 Regulr Expressions Regulr expressions re convenient mens of specifying vrious simple (though possibly infinite) sets of strings. Regulr expressions re of prcticl interest becuse they cn be used to specify the structure of the tokens used in progrmming lnguge. In prticulr, regulr expressions cn be used to progrm scnner genertor. Regulr expressions re widely used in computer pplictions other thn compilers. The Unix utility grep uses regulr expressions to define serch ptterns in files. Unix shells llow restricted form of regulr expressions when specifying file lists for commnd. Most editors provide context serch commnd tht specifies desired mtches using regulr expressions. The sets of strings defined by regulr expressions re termed regulr sets. For purposes of scnning, token clss will be regulr set, whose structure is defined by regulr expression. Prticulr instnces of token clss re sometimes clled lexemes, though we will simply cll string in token clss n instnce of tht token. For exmple, we will cll the string bc n identifier if it mtches the regulr expression tht defines the set of vlid identifier tokens. Our definition of regulr expressions strts with finite chrcter set, or vocbulry (denoted V). This vocbulry is normlly the chrcter set used by computer. Tody, the ASCII chrcter set, which contins totl of 128 chrcters, is very widely used. An empty or null string is llowed (denoted, lmbd ). Lmbd represents n empty buffer in which no chrcters hve yet been mtched. It lso represents optionl prts of tokens. Thus n integer literl my begin with plus or minus, or it my begin with if it is unsigned. Strings re built from chrcters in the chrcter set V vi ctention. As chrcters re ctented to string, it grows in length. The string do is built by first ctenting d to, nd then ctenting o to the string d. The null string, when ctented with ny string s, yields s. Tht is, s s s. Ctenting to string is like dding 0 to n integer nothing chnges. Ctention is extended to sets of strings s follows: Let P nd Q be sets of strings. The symbol represents set membership. If s 1 P nd s 2 Q then string s 1 s 2 (P Q). Smll finite sets re conveniently represented by listing their elements, which cn be individul chrcters or strings of chrcters. Prentheses re used to delimit expressions, nd, the lterntion opertor, is used to seprte lterntives. For exmple, D, the set of the ten single digits, is defined s D = ( ). (In this text we will often use bbrevitions like (0 9) rther thn enumerte complete list of lterntives. The symbol is not prt of our regulr expression nottion.) The chrcters (, ), ',, +, nd re met-chrcters (punctution nd regulr expression opertors). Met-chrcters must be quoted when used s ordinry chrcters to void mbiguity. (Any chrcter or string my be quoted, but unnecessry quottion is voided to enhnce redbility.) For exmple the expression ( '(' ')' ;, ) defines four single chrcter tokens (left prenthesis, right prenthesis, semicolon nd comm) tht we might use in progrmming lnguge. The prentheses re quoted to show they re ment to be individul tokens nd not delimiters in lrger regulr expression. Alterntion cn be extended to sets of strings. Let P nd Q be sets of strings. Then string s (P Q) if nd only if s P or s Q. For exmple, if LC is the set of lower-cse letters nd UC is the set of uppercse letters, then (LC UC) denotes the set of ll letters (in either cse). CRAFTING A COMPILER, 2ND EDITION 3

4 Scnning Theory nd Prctice Lrge (or infinite) sets re conveniently represented by opertions on finite sets of chrcters nd strings. Ctention nd lterntion my be used. A third opertion, Kleene closure, is lso llowed. The opertor will represent the postfix Kleene closure opertor. Let P be set of strings. Then P * represents ll strings formed by the ctention of zero or more selections (possibly repeted) from P. (Zero selections re represented by ). For exmple, LC * is the set of ll words composed only of lower-cse letters, of ny length (including the zero length word, ). Precisely stted, string s P * if nd only if s cn be broken into zero or more pieces: s = s 1 s 2... s n such tht ech s i P (n 0, 1 i n). We explicitly llow n = 0, so tht is lwys in P *. Now tht we re fmilir with the opertors used in regulr expressions, we cn define regulr expressions s follows. is regulr expression denoting the empty set (the set contining no strings). is rrely used, but is included for completeness. is regulr expression denoting the set tht contins only the empty string. This set is not the sme s the empty set, becuse it contins one element. A string s is regulr expression denoting set contining the single string s. If s contins metchrcters, s cn be quoted to void mbiguity. If A nd B re regulr expressions, then A B, A B, nd A * re lso regulr expressions, denoting the lterntion, ctention, nd Kleene closure of the corresponding regulr sets. Ech regulr expression denotes set of strings ( regulr set). Any finite set of strings cn be represented by regulr expression of the form (s 1 s 2 s k ). Thus the reserved words of ANSI C cn be defined s (uto brek cse ). We will find the following dditionl opertions useful. They re not strictly necessry, becuse their effect cn be obtined (perhps somewht clumsily) using the three stndrd regulr opertors (lterntion, ctention, Kleene closure): P + denotes ll strings consisting of one or more strings in P ctented together: P * = (P + ) nd P + = P P *. For exmple, the expression ( 0 1 ) + is the set of ll strings contining one or more bits. If A is set of chrcters, Not(A) denotes (V A); tht is, ll chrcters in V not included in A. Since Not(A) contins chrcters rther thn strings, it must be finite, nd is utomticlly regulr. Not(A) does not contin since is not chrcter (it is zero-length string). As n exmple, Not(Eol) is the set of ll chrcters excluding Eol (the end of line chrcter, '\n' in Jv or C). It is possible to extend Not to strings, rther thn just V. Tht is, if S is set of strings, we cn define S to be (V * S); tht is the set of ll strings except those in S. Though S is usully infinite, it is lso regulr if S is (see Exercise 18). If k is constnt, the set A k represents ll strings formed by ctenting k (possibly different) strings from A. Tht is, A k = (A A A ) (k copies). Thus ( 0 1 ) 32 is the set of ll bit strings exctly 32 bits long. Exmples We will now explore how regulr expressions cn be used to specify tokens. Let D be the set of the ten single digits nd let L be the set of ll letters (52 in ll). Then A Jv or C++ single-line comment tht begins with // nd ends with Eol cn be defined s: Comment = // Not(Eol) * Eol 4 CRAFTING A COMPILER, 2ND EDITION

5 Finite Automt nd Scnners This regulr expression sys comment begins with two slshes nd ends t the first end of line. Within the comment we llow ny sequence of chrcters not contining n end of line (this gurntees the first end of line we see ends the comment). A fixed deciml literl (e.g., ) cn be defined s: Lit = D +. D + We require one or more digits on both sides of the deciml point, so this definition excludes.12 nd 35. An optionlly signed integer literl cn be defined s: IntLiterl = ( '+' ) D + An integer literl is one or more digits preceded by plus or minus or no sign t ll (). So tht the plus is not confused with the Kleene closure opertor, it is quoted. A more complicted exmple is comment delimited by ## mrkers, which llows single # s within the comment body: Comment2 = ## ((# ) Not(#) ) * ## Within this comment s body, whenever # ppers, it must be followed by non-# so tht premture end of comment mrker, ##, is not found. All finite sets nd mny infinite sets re regulr. But not ll infinite sets re regulr. For exmple, consider the set of blnced brckets of the form [ [ [ ] ] ]. This set is defined formlly s { [ m ] m m 1 }. This is set tht is known not to be regulr. The problem is tht ny regulr expression tht tries to define it either does not get ll blnced nestings or it includes extr, unwnted strings. ( Exercise 14 proves this.) It is esy to write context-free grmmr (CFG) tht defines blnced brckets precisely. In fct, ll regulr sets cn be defined by CFGs. Thus, our brcket exmple shows tht CFGs re more powerful descriptive mechnism thn regulr expressions. Regulr expressions re, however, quite dequte for specifying token-level syntx. Moreover, for every regulr expression we cn crete n efficient device, clled finite utomton, tht recognizes exctly those strings tht mtch the regulr expression s pttern. 3.3 Finite Automt nd Scnners A finite utomton (FA) cn be used to recognize the tokens specified by regulr expression. An FA is simple, idelized computer tht recognizes strings belonging to regulr sets. It consists of: A finite set of sttes A set of trnsitions (or moves) from one stte to nother, lbeled with chrcters in V A specil stte clled the strt stte A subset of the sttes clled the ccepting, or finl, sttes These four components of finite utomton re often represented grphiclly: CRAFTING A COMPILER, 2ND EDITION 5

6 Scnning Theory nd Prctice is stte is trnsition is the strt stte is n ccepting stte Finite utomt (the plurl of utomton is utomt) cn be represented grphiclly using trnsition digrms. Using these digrms, we strt t the strt stte. If the next input chrcter mtches the lbel on trnsition from the current stte, we go to the stte it points to. If no move is possible, we stop. If we finish in n ccepting stte, the sequence of chrcters red forms vlid token; otherwise, we hve not seen vlid token. In the digrm shown below, the vlid tokens re the strings described by the regulr expression ( b (c) + ) +. b c As n bbrevition, trnsition my be lbeled with more thn one chrcter (for exmple, Not(c)). The trnsition my be tken if the current input chrcter mtches ny of the chrcters lbeling the trnsition. If n FA lwys hs unique trnsition (for given stte nd chrcter), the FA is deterministic (tht is, deterministic FA, or DFA). Deterministic finite utomt re esy to progrm nd re often used to drive scnner. A DFA is conveniently represented in computer by trnsition tble. A trnsition tble, T, is two dimensionl rry indexed by DFA stte nd vocbulry symbol. Tble entries re either DFA stte or n error flg (often represented s blnk tble entry). If we re in stte s, nd red chrcter c, then T[s,c] will be the next stte we visit, or T[s,c] will contin n error flg indicting tht c cnnot be prt of the current token. For exmple, the regulr expression // Not(Eol) * Eol which defines Jv or C++ single-line comment, might be trnslted into / / Eol c Not(Eol) 6 CRAFTING A COMPILER, 2ND EDITION

7 Finite Automt nd Scnners The corresponding trnsition tble is Stte Chrcter / Eol b FIGURE 1 A full trnsition tble will contin one column for ech chrcter. To sve spce, tble compression is sometimes utilized. Tht is, only non-error entries re explicitly represented in the tble, using hshing or linked structures. Any regulr expression cn be trnslted into DFA tht ccepts (s vlid tokens) the set of strings denoted by the regulr expression. This trnsltion cn be done mnully by progrmmer or utomticlly using scnner genertor. A DFA cn be coded in Tble-driven form Explicit control form In the tble-driven form, the trnsition tble tht defines DFA s ctions is explicitly represented in run-time tble tht is interpreted by driver progrm. In the direct control form, the trnsition tble tht defines DFA s ctions ppers implicitly s the control logic of the progrm. Typiclly individul progrm sttements correspond to distinct DFA sttes. For exmple, suppose CurrentChr is the current input chrcter. Using the DFA for the Jv comments illustrted bove, the two pproches would produce the progrms illustrted in Figures 1 nd 2. The first form is commonly produced by scnner genertor; it is token-independent. It uses simple driver tht cn scn ny token if the trnsition tble is properly stored in T. The ltter form my be produced utomticlly or by hnd. The token being scnned is hrdwired into the code. This form of scnner is usully esy to red nd often is more efficient, but is specific to single token definition. Asume CurrentChr contins the first chrcter to be scnned 1. Stte StrtStte 2. while true 3. do if CurrentChr = eof 4. then brek 5. NextStte T[Stte, CurrentChr] 6. if NextStte = error 7. then brek 8. Stte NextStte 9. READ(CurrentChr) 10. if Stte AcceptingSttes 11. then Return or process vlid token 12. else Signl lexicl error Scnner Driver Interpreting Trnsition Tble CRAFTING A COMPILER, 2ND EDITION 7

8 Scnning Theory nd Prctice FIGURE 2 Asume CurrentChr contins the first chrcter to be scnned 1. if CurrentChr = '/' 2. then READ(CurrentChr) 3. if CurrentChr = '/' 4. then repet 5. READ(CurrentChr) 6. until CurrentChr { eol, eof } 7. else Signl lexicl error 8. else Signl lexicl error 9. if CurrentChr = eol 10. then Return or process vlid token 11. else Signl lexicl error Explicit Control Scnner The following re two more exmples of regulr expressions nd their corresponding DFAs: A FORTRAN-like rel literl (which requires digits on either or both sides of deciml point, or just string of digits) cn be defined s RelLit = (D + (. )) (D *. D + ) which corresponds to the DFA. D D D D. An identifier consisting of letters, digits, nd underscores, which begins with letter nd llows no djcent or triling underscores, my be defined s ID = L (L D) * ( _ (L D) + ) * This definition includes identifiers like sum or unit_cost, but excludes _one nd two_ nd grnd totl. The corresponding DFA is L L D _ L D 8 CRAFTING A COMPILER, 2ND EDITION

9 The Lex Scnner Genertor So fr we hven t sved or processed the chrcters we ve scnned they re mtched nd then thrown wy. It is useful to dd n output fcility to n FA; this mkes the FA trnsducer. As chrcters re red, they cn be trnsformed nd ctented to n output string. For our purposes, we shll limit the trnsformtion opertions to sving or deleting input chrcters. After token is recognized, the trnsformed input cn be pssed to other compiler phses for further processing. We use this nottion: T() mens sve in token buffer mens don t sve (Toss it wy) For exmple, for Jv nd C++ comments, we might write T(/) T(/) T(Eol) T(Not(Eol)) A more interesting exmple is given by Pscl-style quoted strings, ccording to the regulr expression (" ( Not(") " " ) * ") A corresponding trnsducer might be T(") " T(") Not(") The input """Hi""" would produce output "Hi". 3.4 The Lex Scnner Genertor We now discuss very populr scnner genertor, Lex. We will lter briefly discuss number of other scnner genertors. Lex ws developed by M.E. Lesk nd E. Schmidt of AT&T Bell Lbortories. It is used primrily with progrms written in C or C++, running under the UNIX operting system. Lex produces n entire scnner module, coded in C, tht cn be compiled nd linked with other compiler modules. A complete description of Lex cn be found in [Lesk nd Schmidt 1975] nd [Levine, Mson nd Brown 1992]. Flex (see [Pxson 1988]) is widely used reimplementtion of Lex tht produces fster nd more relible scnners. Vlid Lex scnner specifictions my, in generl, be used with Flex without modifiction. The opertion of Lex is illustrted in Figure 3. A scnner specifiction, defining the tokens to be scnned nd how they re to be processed, is presented to Lex. Lex then genertes complete scnner, CRAFTING A COMPILER, 2ND EDITION 9

10 Scnning Theory nd Prctice coded in C. This scnner is then compiled nd linked with other compiler components to crete complete compiler. Scnner Specifiction Lex Scnner Module (in C) FIGURE 3 The Opertion of the Lex Scnner Genertor Using Lex sves us gret del of effort in progrmming scnner. We re relieved of the necessity of explicitly progrmming mny low level detils of the scnner (reding chrcter efficiently, buffering them, mtching chrcters ginst token definitions, nd so on). Rther, we cn focus on the chrcter structure of tokens, nd how they re to be processed. Our primry purpose in this section is to show how regulr expressions nd relted informtion re presented to scnner genertors. A good wy to lern to use Lex is to strt with the simple exmples presented here nd then grdully generlize them to solve the problem t hnd. To inexperienced reders, Lex s rules my seem unnecessrily complex. It is best to keep in mind tht the key is lwys the specifiction of tokens s regulr expressions; the rest is there simply to increse efficiency nd hndle vrious detils Defining Tokens in Lex Lex s pproch to scnning is simple. It llows the user to ssocite regulr expressions with commnds coded in C (or C++). When input chrcters tht mtch the regulr expression re red, the commnd is executed. As user of Lex you don t need to tell it how to mtch tokens; you need only sy wht you wnt done when prticulr token is mtched. Lex cretes file lex.yy.c tht contins n integer function yylex(). This function is normlly clled from the prser whenever nother token is needed. The vlue returned by yylex() is the token code of the token scnned by Lex. Tokens like white spce re deleted simply by hving their ssocited commnd not return nything. Scnning continues until commnd with return in it is executed. Figure 4 illustrtes simple Lex definition for the three reserved words of the c lnguge (which ws introduced in Chpter 2). When string mtching f or i or p is found, the pproprite token code is returned. It is vitl tht the token codes returned when token is mtched re identicl to those expected by the prser. If they re not, the prser won t see the sme token sequence produced by the scnner. This will cuse the prser to generte flse syntx errors bsed on the incorrect token strem it sees. It is stndrd for the scnner nd prser to shre the definition of token codes to gurntee consistent vlues re seen by both. The file y.tb.h, produced by the Ycc prser genertor (see Chpter 6) is often used to define shred token codes. 10 CRAFTING A COMPILER, 2ND EDITION

11 The Lex Scnner Genertor %% f { return(floatdcl); } i { return(intdcl); } p { return(print); } %% FIGURE 4 A Lex Definition for c s Reserved Words The pir %% delimits sections of Lex token specifiction. Three sections exist; the generl form of Lex specifiction is declrtions %% regulr expression rules %% subroutine definitions In our simple exmple, we ve used only the second section in which regulr expressions nd corresponding C code re specified. The regulr expressions illustrted in Figure 4 re simple single chrcter strings tht mtch only themselves. The code executed returns constnt vlue representing the pproprite c token. If we wished, we could hve quoted the strings representing the reserved words ("f" or "i" or "p"), but since these strings contin no delimiters or opertors, quoting it is unnecessry. If you wnt to quote such strings to void ny chnce of misinterprettion, tht s fine with Lex. Our specifiction so fr is quite incomplete. None of the other tokens in c re hndled, prticulrly identifiers nd numbers. To hndle these tokens correctly, we ll introduce useful concept chrcter clsses. Chrcters often nturlly fll into clsses, with ll chrcters in clss treted identiclly in token definition. Thus in the definition of n c identifier ll letters (except f, i nd p) form clss since ny of them cn be used to form n identifier. Similrly, in number, ny of the ten digit chrcters cn be used. Chrcter clsses re delimited by [ nd ]; individul chrcters re ctented without ny quottion or seprtors. However \, ^, ] nd -, becuse of their specil mening in chrcter clsses, must be escped. Thus [xyz] represents the clss tht cn mtch single x, y, or z. The expression [\])] represents the clss tht cn mtch single ] or ). (The ] is escped so tht it isn t misinterpreted s the end of chrcter clss symbol.) Rnges of chrcters re seprted by -; [x-z] is the sme s [xyz]. [0-9] is the set of ll digits nd [-za-z] is the set of ll letters, upper- nd lower-cse. \ is the escpe chrcter, used to represent unprintbles nd to escpe specil symbols. Following C conventions, \n is the newline (tht is, end of line), \t is the tb chrcter, \\ is the bckslsh symbol itself, nd \010 is the chrcter corresponding to octl 10. The ^ symbol complements chrcter clss (it is Lex s representtion of the Not opertion). [^xy] is the chrcter clss tht mtches ny single chrcter except x nd y. The ^ symbol pplies to ll chrcters tht follow it in the chrcter clss definition, so [^0-9] is the set of ll chrcters tht ren t digits. [^] cn be used to mtch ll chrcters. (Avoid use of \0 in chrcter clsses; it cn be con- CRAFTING A COMPILER, 2ND EDITION 11

12 Scnning Theory nd Prctice fused with the null chrcter s specil use s end of string termintor in C.) Tble 1 illustrtes vriety of chrcter clsses nd the chrcter sets they define. TABLE 1. Lex Chrcter Clss Definitions Chrcter Clss [bc] [cb] [-c] [bbcc] [^bc] Set of Chrcters Denoted Three chrcters:, b nd c Three chrcters:, b nd c Three chrcters:, b nd c Three chrcters:, b nd c All chrcters except, b nd c [\^\-\]] Three chrcters: ^, - nd \ [^] "[bc]" All chrcters Not chrcter clss. This is one five chrcter string: [bc] With chrcter clsses we cn esily define c identifiers, s shown in Figure 5. The chrcter clss includes the rnge of chrcters, to e, then g nd h, then the rnge j to o, followed by the rnge q to z. We re ble to concisely represent the 23 chrcters tht my form n c identifier without hving to enumerte them ll. %% [-eghj-oq-z] { return(id); } %% FIGURE 5 A Lex Definition for c s Identifiers Tokens re defined using regulr expressions. Lex provides the stndrd regulr expression opertors, s well s some dditions. Ctention is specified by the juxtposition of two expressions; no explicit opertor is used. Thus [b][cd] will mtch ny of d, c, bc, nd bd. When outside of chrcter clss brckets, individul letters nd numbers mtch themselves; other chrcters should be quoted (to void misinterprettion s regulr expression opertors). For exmple, while (s used in C, C++ nd Jv) cn be mtched by the expressions while, "while", or [w][h][i][l][e]. Cse is significnt. The lterntion opertor is. As usul, prentheses cn be used to control grouping of subexpressions. Therefore if we wish to mtch the reserved word while llowing ny mixture of upper- nd lowercse (s required in Pscl nd Ad), we cn use (w W)(h H)(i I)(l L)(e E) Postfix opertors * (Kleene closure) nd + (positive closure) re lso provided, s is? (optionl inclusion). expr? mtches expr zero times or once. It is equivlent to (expr) nd obvites the need for n explicit symbol. The chrcter "." mtches ny single chrcter (other thn newline). The chrcter ^ (when used outside chrcter clss) mtches the beginning of line. Similrly, the chrcter $ mtches the end of line. Thus, ^A.*e$ could be used to mtch n entire line tht begins with A nd ends with e. 12 CRAFTING A COMPILER, 2ND EDITION

13 The Lex Scnner Genertor We cn now define ll of c s tokens using Lex s regulr expression fcilities. This is shown in Figure 6. %% (" ")+ { /* delete blnks */} f { return(floatdcl); } i { return(intdcl); } p { return(print); } [-eghj-oq-z] { return(id); } ([0-9]+) ([0-9]+"."[0-9]+) { return(num); } "=" { return(assign); } "+" { return(plus); } "-" { return(minus); } %% FIGURE 6 A Lex Definition for c s Tokens Recll tht Lex specifiction of scnner consists of three sections. The first section, which we ve not used so fr, contins symbolic nmes ssocited with chrcter clsses nd regulr expressions. There is one definition per line. Ech definition line contins n identifier nd definition string, seprted by blnk or tb. The { nd } symbols signl the mcro-expnsion of symbol defined in the first section. For exmple, given the definition Letter [ za Z] the expression {Letter} expnds to [-za-z]. Symbolic definitions cn often mke Lex specifictions esier to red, s illustrted in Figure 7. %% Blnk " " Digits [0-9]+ Non_f_i_p [-eghj-oq-z] %% {Blnk}+ { /* delete blnks */} f { return(floatdcl); } i { return(intdcl); } p { return(print); } {Non_f_i_p} { return(id); } {Digits} ({Digits}"."{Digits}) { return(num); } "=" { return(assign); } "+" { return(plus); } "-" { return(minus); } %% FIGURE 7 An Alterntive Lex Definition for c s Tokens In the first section we cn lso include source code, delimited by %{ nd %}, tht is plced before the commnds nd regulr expressions of section two. This source code my include sttements nd vrible, procedure nd type declrtions tht re needed to llow the commnds of section two to be compiled. For exmple, we might use CRAFTING A COMPILER, 2ND EDITION 13

14 Scnning Theory nd Prctice %{ #include "tokens.h" %} to include the definitions of token vlues returned when tokens re mtched. As we hve seen, Lex s second section defines tble of regulr expressions nd corresponding commnds in C. The first blnk or tb not escped or prt of quoted string or chrcter clss is tken s the end of the regulr expression, so void embedded blnks within regulr expressions. When n expression is mtched, its ssocited commnd is executed. If n input sequence mtches no expression, the sequence is simply copied verbtim to the stndrd output file. Input tht is mtched is stored in globl string vrible yytext (whose length is yyleng). Commnds my lter yytext in ny wy. The defult size of yytext is determined by YYLMAX, which is initilly defined to be 200. All tokens, even those tht will be ignored like comments, re stored in yytext. Hence you my need to redefine YYL- MAX to void overflow. An lterntive pproch to scnning comments, tht is not prone to the dnger of overflowing yytext, involves the use of strt conditions (see [Lesk nd Schmidt 1975] or [Levine, Mson nd Brown 1992]). Flex, n improved version of Lex discussed in the next section, utomticlly extends the size of yytext when necessry. This removes the dnger tht very long token my overflow the text buffer. The contents of yytext is overwritten s ech new token is scnned. Therefore you must be creful if you return the text of token by simply returning pointer into yytext. You must copy the contents of yytext (using perhps strcpy()) before the next cll to yylex(). Lex llows regulr expressions to overlp (tht is, to mtch the sme input sequences). In the cse of overlp, two rules re used to determine which regulr expression is mtched. First, the longest possible mtch is performed. Lex utomticlly buffers chrcters while deciding how mny chrcters cn be mtched. Second, if two expressions mtch exctly the sme string, the erlier expression (in order of definition in the Lex specifiction) is preferred. Reserved words, for exmple, re often specil cses of the pttern used for identifiers. Their definitions re therefore plced before the expression tht defines n identifier token. Often ctch ll pttern is plced t the very end of section two. It is used to ctch chrcters tht don t mtch ny of the erlier ptterns nd hence re probbly erroneous. Recll tht "." mtches ny single chrcter (other thn newline). It is useful in ctch-ll pttern. However, void pttern like.* which will consume ll chrcters up to the next newline. Although Lex is often used to produce scnners, it is relly generl-purpose chrcter processing tool, progrmmed using regulr expressions. Lex provides no chrcter tossing mechnism becuse this would be too specil-purpose. It my therefore be necessry to process the token text (stored in yytext) before returning token code. This is normlly done by clling subroutine in the commnd ssocited with regulr expression. The definition of such subroutines my be plced in the finl section of the Lex specifiction. For exmple, we might wnt to cll subroutine to insert n identifier into symbol tble before it is returned to the prser. For c the line {Non_f_i_p} {insert(yytext); return(id);} could do this, with insert defined in the finl section. Alterntively, the definition of insert could be plced in seprte file contining symbol tble routines. This would llow insert to be chnged nd recompiled without rerunning Lex. (Some implementtions of Lex generte scnners rther slowly.) In Lex, end of file is not hndled by regulr expressions. A predefined EOF token, with token code of zero, is utomticlly returned when end of file is reched t the beginning of cll to yylex(). It is up to the prser to recognize the zero return vlue s signifying the EOF token. If more thn one source file must be scnned, this fct is hidden inside the scnner mechnism. yylex() uses three user-defined functions to hndle chrcter-level input nd output. They re 14 CRAFTING A COMPILER, 2ND EDITION

15 The Lex Scnner Genertor input() output(c) unput(c) red single chrcter, 0 on end of file. write single chrcter to the output. put single chrcter bck into the input to be re-red. When yylex() encounters end of file, it clls user-supplied integer function nmed yywrp(). The purpose of this routine is to wrp up input processing. It returns the vlue one if there is no more input. Otherwise, it returns zero nd rrnges for input() to provide more chrcters. The compiler writer my supply the input(), output(), unput(), nd yywrp() functions (usully s C mcros). Lex supplies defult versions tht red chrcters from the stndrd input nd write them to the stndrd output. The defult version of yywrp() simply returns one, signifying tht there is no more input. (The use of output() llows Lex to be used s tool for producing stnd-lone dt filters for trnsforming strem of dt.) Lex-generted scnners normlly select the longest possible input sequence tht mtches some token definition. Occsionlly this cn be problem. For exmple, if we llow Fortrn-like fixed-deciml literls like 1. nd.10 nd the Pscl subrnge opertor ".." then will most likely be misscnned s two fixed-deciml literls rther thn two integer literls seprted by the subrnge opertor. Lex llows us to define regulr expression tht pplies only if some other expression immeditely follows it. Tht is, r/s tells Lex to mtch regulr expression r but only if regulr expression s immeditely follows it. s is right context; it isn t prt of the token tht is mtched, but it must be present for r to be mtched. Thus [0-9]+/".." would mtch n integer literl, but only if ".." immeditely follows it. Since this pttern covers more chrcters thn the one defining fixed-deciml literl, it tkes precedence. The longest mtch is still chosen, but the right-context chrcters re returned to the input so tht they cn be mtched s prt of lter token. The opertors nd specil symbols most commonly used in Lex re summrized in Tble 2. Note tht symbol sometimes hs one mening in regulr expression nd n entirely different mening in chrcter clss (i.e., within pir of brckets). If you find Lex behving unexpectedly, it s good ide to check this tble to be sure of how the opertors nd symbols you ve used behve. Ordinry letters nd digits, nd symbols not mentioned represent themselves. If you re not sure if chrcter is specil or not, you cn lwys escpe it or mke it prt of quoted string. In summry, Lex is very flexible genertor tht cn produce complete scnner from succinct definition. The difficult prt is lerning Lex s nottion nd rules. Once you ve done this, Lex will relieve you of the mny of chores of writing scnner (reding chrcters, buffering them, deciding which token pttern mtches, etc.). Moreover, Lex s nottion for representing regulr expressions is used in other Unix progrms, most notbly the grep pttern mtching utility. Lex cn lso trnsform input s preprocessor, s well s scn it. It provides number of dvnced fetures beyond those discussed here. Lex does require tht code segments be written in C, nd hence is not lnguge-independent. TABLE 2. Mening of Opertors nd Specil Symbols in Lex Symbol Mening in Regulr Expressions Mening in Chrcter Clsses ( Mtches with ) to group sub-expressions. Represents itself. ) Mtches with ( to group sub-expressions. Represents itself. [ Begins chrcter clss. Represents itself. ] Represents itself. Ends chrcter clss. { Mtches with } to signl mcro-expnsion. Represents itself. CRAFTING A COMPILER, 2ND EDITION 15

16 Scnning Theory nd Prctice TABLE 2. Mening of Opertors nd Specil Symbols in Lex Symbol Mening in Regulr Expressions Mening in Chrcter Clsses } Mtches with { to signl mcro-expnsion. Represents itself. " Mtches with " to delimit strings (only \ is specil within strings). \ Escpes individul chrcters. Also used to specify chrcter by its octl code. Represents itself. Escpes individul chrcters. Also used to specify chrcter by its octl code.. Mtches ny one chrcter except \n. Represents itself. Alterntion (or) opertor. Represents itself. * Kleene closure opertor (zero or more mtches). Represents itself. + Positive closure opertor (one or more mtches). Represents itself.? Optionl choice opertor (one or zero mtches). Represents itself. / Context sensitive mtching opertor. Represents itself. ^ Mtches only t beginning of line. Complements remining chrcters in the clss. $ Mtches only t end of line. Represents itself. - Represents itself. Rnge of chrcters opertor Other Scnner Genertors Lex is certinly the most widely-known nd widely-vilble scnner genertor becuse it is distributed s prt of the Unix system. Even fter yers of use it still hs bugs, nd produces scnners too slow to be used in production compilers. [Jcobsen 1987] hs shown tht Lex cn be improved so tht it is lwys fster thn hnd-written scnner. As noted erlier, Flex (Fst Lex) is freely distributbled Lex clone. It produces scnners tht re considerbly fster thn the ones produced by Lex. Flex lso provides options tht llow tuning of the scnner size versus its speed, s well s some fetures tht Lex does not hve (such s support for eight-bit chrcters). If Flex is vilble on your system you should use it insted of Lex. Lex hs lso been implemented in lnguges other thn C. JLex [Berk 1997] is Lex-like scnner genertor written in Jv tht genertes Jv scnner clsses. It is of prticulr interest to individuls writing compilers in Jv. Alex, [Self 1990], is n Ad version of Lex. Lexgen, [Appel 1989], is n ML version of Lex. An interesting lterntive to Lex is GLA (Genertor for Lexicl Anlyzers), [Gry 1988]. GLA tkes description of scnner bsed on regulr expressions nd librry of common lexicl idioms (such s pscl comment nd produces directly executble (tht is, not trnsition tble-driven) scnner written in C. GLA ws designed with both ese of use nd efficiency of the generted scnner in mind. Experiments show it to be typiclly twice s fst s Flex nd only slightly slower thn trivil progrm tht reds nd touches ech chrcter in n input file. The scnners it produces re more thn competitive with the best hnd-coded scnners. Another tool tht produces directly executble scnners is RE2C, [Bumbulis 1993]. The scnners it produces re esily dptble to vriety of environments nd yet scnning speed is excellent. Scnner genertors re usully included s prts of complete suites of compiler development tools. These suites re often vilble on DOS nd Mcintosh systems s well s Unix systems. Among the 16 CRAFTING A COMPILER, 2ND EDITION

17 Prcticl Considertions most widely-used nd highly-recommended of these re DLG (prt of the PCCTS tools suite, [Prr 1991]), CoCo/R, [Moessenboeck 1991], n integrted scnner/prser genertor, nd Rex, [Grosch 1989], prt of the Krlsruhe Cocktil tools suite. 3.5 Prcticl Considertions In this section we discuss the prcticl considertions necessry to build rel scnners for rel progrmming lnguges. As one might expect, the finite utomton model we hve developed sometimes flls short nd must be supplemented. Efficiency concerns must be ddressed. Further, some provision for error hndling must be incorported into ny prcticl scnner. We shll discuss number of potentil problem res. In ech cse, solutions will be weighed, prticulrly in conjunction with the Lex scnner genertor we hve studied Processing Identifiers nd Literls In simple lnguges with only globl vribles nd declrtions, it is common to hve the scnner immeditely enter n identifier into the symbol tble if it is not lredy there. Whether the identifier is entered or is lredy in the tble, pointer to the symbol tble entry is then returned from the scnner. In block-structured lnguges, we usully don t sk the scnner to enter or look up identifiers in the symbol tble becuse n identifier cn be used in mny contexts (s vrible, in declrtion, s member of clss, lbel, nd more). It is not possible, in generl, for the scnner to know when n identifier should be entered into the symbol tble for the current scope or when it should return pointer to n instnce from n erlier scope. Some scnners just copy the identifier into privte string vrible (tht cn t be overwritten) nd return pointer to it. A lter compiler phse, the type checker, will resolve the identifier s intended usge. Sometimes string spce is used to store identifiers (see Chpter 8). This voids frequent clls to memory lloctors like new or mlloc to llocte privte spce for string nd voids the spce overhed of storing multiple copies of the sme string. If string spce is used, the scnner cn enter n identifier into the string spce nd return string spce pointer rther thn the ctul text. An lterntive to string spce is hsh tble tht stores identifiers nd ssigns to ech unique seril number. All identifiers tht hve the sme text get the sme seril number; identifiers with different texts lwys get different seril numbers. A seril number is smll integer tht cn be used insted of string spce pointer. Seril numbers re idel indices into symbol tbles (which need not be hshed) becuse they re smll contiguously ssigned integers. A scnner cn hsh n identifier when it is scnned nd return its seril number s prt of the identifier token. In some lnguges, such s C, C++ nd Jv, cse is significnt, but in others, such s Ad nd Pscl, cse is insignificnt. If cse is significnt, identifier text must be stored or returned exctly s it ws scnned. Reserved word lookup must distinguish between identifiers nd reserved words tht differ only in cse. However, if cse is insignificnt, we need to gurntee tht cse differences in the spelling of n identifier or reserved word do not cuse errors. An esy wy to do this is to put ll tokens scnned s identifiers into uniform cse before they re returned or looked up in reserved word tble. Other tokens, such s literls, require processing before they re returned. Integer nd rel (floting) literls re converted to numeric form nd returned s prt of the token. Numeric conversion cn be tricky becuse of the dnger of overflow or roundoff errors. It is wise to use stndrd librry routines like toi nd tof (in C) nd Integer.intVlue nd Flot.flotVlue (in Jv). For string literls, pointer to the text of the string (with escped chrcters expnded) should be returned. CRAFTING A COMPILER, 2ND EDITION 17

18 Scnning Theory nd Prctice The design of C contins flw tht requires C scnner to do bit of specil processing. Consider the chrcter sequence (* b); This cn be cll to procedure, with *b s the prmeter. If hs been declred in typedef to be type nme, the bove chrcter sequence cn lso be the declrtion of n identifier b tht is pointer vrible (the prentheses re not needed, but they re legl). C contins no specil mrker seprting declrtions from sttements, so the prser will need some help in deciding whether it is seeing procedure cll or vrible declrtion. One wy to do this is to crete, while scnning nd prsing, tble of currently-visible identifiers tht hve been defined in typedef declrtions. When n identifier in this tble is scnned, specil typeid token is returned (rther thn n ordinry id token). This llows the prser to esily distinguish the two constructs they now begin with different tokens. Why does this compliction exist in C? typedef sttements were not in the originl definition of C in which the lexicl nd syntctic rules were estblished. When the typedef construct ws dded, the mbiguity ws not immeditely recognized (prentheses, fter ll, re rrely used in vrible declrtions). When the problem ws finlly recognized it ws too lte, nd the trick described bove hd to be devised to resolve the correct usge Reserved Words Virtully ll progrmming lnguges hve symbols (such s if nd while) tht mtch the lexicl syntx of ordinry identifiers. These symbols re termed key words. If the lnguge hs rule tht key words my not be used s progrmmer-defined identifiers, then they re termed reserved words (tht is, they re reserved for specil use). Most progrmming lnguges choose to mke key words reserved. This simplifies prsing, which drives the compiltion process. It lso mkes progrms more redble. For exmple, in Pscl nd Ad subprogrms without prmeters re clled s nme; (no prentheses re required). Now ssume tht begin nd end re not reserved nd some devious progrmmer hs declred procedures nmed begin nd end. The following progrm cn be prsed in mny wys; its mening is not well defined: begin begin; end; end; begin; end With creful design, outright mbiguities cn be voided. For exmple, in PL/I key words re not reserved, but procedures re clled using n explicit cll key word. Nonetheless, opportunities for convoluted usge bound becuse key words my be used s vrible nmes: if if then else = then; The problem with reserved words is tht if they re too numerous, they my confuse inexperienced progrmmers who unknowingly choose n identifier nme tht clshes with reserved word. This usully cuses syntx error in progrm tht looks right nd in fct would be right were the symbol in question not reserved. COBOL is infmous for this problem, hving severl hundred reserved words. For exmple, in COBOL, zero is reserved word. So is zeros. So is zeroes! 18 CRAFTING A COMPILER, 2ND EDITION

19 Prcticl Considertions In Section we were ble to recognize reserved words by creting distinct regulr expressions for ech reserved word. This pproch ws fesible becuse Lex (nd Flex) llow more thn one regulr expression to mtch chrcter sequence, with the erliest expression tht mtches tking precedence. Creting regulr expressions for ech reserved word will increse the number of sttes in the trnsition tble scnner genertor cretes. [Gry 1988] reports tht in s simple lnguge s Pscl (which hs only 35 reserved words), the number of sttes increses from 37 to 165 when ech reserved word is defined by its own regulr expression. In uncompressed form with 127 columns for ASCII chrcters (excluding null), the number of trnsition tble entries would increse from 4699 to 20,955. This my not be problem with modern multi-megbyte memories. Nonetheless, some scnner genertors, like Flex, llow you to choose to optimize scnner size or scnner speed. In Exercise 18 it is estblished tht ny regulr expression my be complemented to obtin ll strings not in the originl regulr expression. Tht is A, the complement of A, is regulr if A is. Using complementtion we cn write regulr expression for nonreserved identifiers: ( ident if while ) Tht is, if we tke the complement of the set contining reserved words nd ll non-identifier strings, we get ll strings tht re identifiers excluding the reserved words. Unfortuntely, neither Lex nor Flex provide complement opertor for regulr expressions (^ works only on chrcter sets). We could just write down regulr expression directly, but this is too complex to seriously consider. Suppose END is the only reserved word, nd identifiers contin only letters. Then L (L L) ((L L L) L + ) ((L 'E') L * ) (L (L 'N') L * ) (L L (L 'D') L * ) defines identifiers shorter or longer thn three letters, or not strting with E or without N in position two, nd so forth. Mny hnd-coded scnners tret reserved words s ordinry identifiers (s fr s mtching tokens is concerned) nd then use seprte tble lookup to detect them. Automticlly generted scnners cn lso use this pproch, especilly if trnsition tble size is n issue. After wht looks like n identifier is scnned, tble of exceptions is consulted to see if reserved word hs been recognized. If cse is significnt in reserved words, the exception lookup will require n exct mtch; otherwise, the token should be trnslted to stndrd form (ll upper- or lowercse) before the lookup. An exception tble my be orgnized in vriety of wys. An obvious orgniztion is sorted list of exceptions suitble for binry serch. A hsh tble my lso be used. For exmple, the length of token my be used s n index into list of exceptions of the sme length. If exception lengths re well distributed, few comprisons will be needed to determine whether token is n identifier or reserved word. It hs been shown by [Cichelli 1980] tht perfect hsh functions re possible. Tht is, ech reserved word is mpped to unique position in the exception tble, nd no position in the tble is unused. A token is either the reserved word selected by the hsh function or it is n ordinry identifier. If identifiers re entered into string spce or given unique seril number by the scnner, then reserved words cn be entered in dvnce. If wht looks like n identifier is found to hve seril number or string spce position smller thn the initil position ssigned to identifiers, then we immeditely know tht reserved word rther thn n identifier hs been scnned. In fct with little cre it is possible to ssign initil seril numbers so tht they mtch exctly the token codes used for reserved words. Tht is, if n identifier is found to hve seril number s where s is less thn the number of reserved words, then s must be the correct token code for the reserved word just scnned. CRAFTING A COMPILER, 2ND EDITION 19

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

Assignment 4. Due 09/18/17

Assignment 4. Due 09/18/17 Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Scanner Termination. Multi Character Lookahead

Scanner Termination. Multi Character Lookahead If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CSE 401 Midterm Exam 11/5/10 Sample Solution

CSE 401 Midterm Exam 11/5/10 Sample Solution Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

CMSC 331 First Midterm Exam

CMSC 331 First Midterm Exam 0 00/ 1 20/ 2 05/ 3 15/ 4 15/ 5 15/ 6 20/ 7 30/ 8 30/ 150/ 331 First Midterm Exm 7 October 2003 CMC 331 First Midterm Exm Nme: mple Answers tudent ID#: You will hve seventy-five (75) minutes to complete

More information

Lexical analysis, scanners. Construction of a scanner

Lexical analysis, scanners. Construction of a scanner Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 3: Lexer genertors Viktor Leijon Slides lrgely y John Nordlnder with mteril generously provided y Mrk P. Jones. 1 Recp: Hndwritten Lexers: Don t require sophisticted

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing Undergrdute Compilers Review nd Intro to MJC Announcements Miling list is in full swing Tody Some thoughts on grd school Finish prsing Semntic nlysis Visitor pttern for bstrct syntx trees Some Thoughts

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Theory of Computation CSE 105

Theory of Computation CSE 105 $ $ $ Theory of Computtion CSE 105 Regulr Lnguges Study Guide nd Homework I Homework I: Solutions to the following problems should be turned in clss on July 1, 1999. Instructions: Write your nswers clerly

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

Compilation

Compilation Compiltion 0368-3133 Lecture 2: Lexicl Anlysis Nom Rinetzky 1 2 Lexicl Anlysis Modern Compiler Design: Chpter 2.1 3 Conceptul Structure of Compiler Compiler Source text txt Frontend Semntic Representtion

More information

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID: Fll term 2012 KAIST EE209 Progrmming Structures for EE Mid-term exm Thursdy Oct 25, 2012 Student's nme: Student ID: The exm is closed book nd notes. Red the questions crefully nd focus your nswers on wht

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program: Lexicl Anlysis Red source progrm nd produce list of tokens ( liner nlysis) source progrm The lexicl structure is specified using regulr expressions Other secondry tsks: (1) get rid of white spces (e.g.,

More information

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing Motivting Exmple Lexicl nd yntx Anlysis (2) In Text: Chpter 4 Consider the grmmr -> cad A -> b Input string: w = cd How to build prse tree top-down? 2 Initilly crete tree contining single node (the strt

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or Opertor Precedence Most progrmming lnguges hve opertor precedence rules tht stte the order in which opertors re pplied (in the sence of explicit prentheses). Thus in C nd Jv nd CSX, +*c mens compute *c,

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009 Deprtment of Computer cience Columbi University mple Midterm olutions COM W4115 Progrmming Lnguges nd Trnsltors Mondy, October 12, 2009 Closed book, no ids. ch question is worth 20 points. Question 5(c)

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Lexical Analysis and Lexical Analyzer Generators

Lexical Analysis and Lexical Analyzer Generators 1 Lexicl Anlysis nd Lexicl Anlyzer Genertors Chpter 3 COP5621 Compiler Construction Copyright Roert vn Engelen, Florid Stte University, 2007-2009 2 The Reson Why Lexicl Anlysis is Seprte Phse Simplifies

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

CMPSC 470: Compiler Construction

CMPSC 470: Compiler Construction CMPSC 47: Compiler Construction Plese complete the following: Midterm (Type A) Nme Instruction: Mke sure you hve ll pges including this cover nd lnk pge t the end. Answer ech question in the spce provided.

More information

MIPS I/O and Interrupt

MIPS I/O and Interrupt MIPS I/O nd Interrupt Review Floting point instructions re crried out on seprte chip clled coprocessor 1 You hve to move dt to/from coprocessor 1 to do most common opertions such s printing, clling functions,

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Epson Projector Content Manager Operation Guide

Epson Projector Content Manager Operation Guide Epson Projector Content Mnger Opertion Guide Contents 2 Introduction to the Epson Projector Content Mnger Softwre 3 Epson Projector Content Mnger Fetures... 4 Setting Up the Softwre for the First Time

More information

Creating Flexible Interfaces. Friday, 24 April 2015

Creating Flexible Interfaces. Friday, 24 April 2015 Creting Flexible Interfces 1 Requests, not Objects Domin objects re esy to find but they re not t the design center of your ppliction. Insted, they re trp for the unwry. Sequence digrms re vehicle for

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev Fll 2016-2017 Compiler Principles Lecture 1: Lexicl Anlysis Romn Mnevich Ben-Gurion University of the Negev Agend Understnd role of lexicl nlysis in compiler Regulr lnguges reminder Lexicl nlysis lgorithms

More information

Stack. A list whose end points are pointed by top and bottom

Stack. A list whose end points are pointed by top and bottom 4. Stck Stck A list whose end points re pointed by top nd bottom Insertion nd deletion tke plce t the top (cf: Wht is the difference between Stck nd Arry?) Bottom is constnt, but top grows nd shrinks!

More information

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22) Homework Context Free Lnguges III Prse Trees nd Homework #5 (due 10/22) From textbook 6.4,b 6.5b 6.9b,c 6.13 6.22 Pln for tody Context Free Lnguges Next clss of lnguges in our quest! Lnguges Recll. Wht

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications Jv CUP Jv CUP is prser-genertion tool, similr to Ycc. CUP uilds Jv prser for LALR(1) grmmrs from production rules nd ssocited Jv code frgments. When prticulr production is recognized, its ssocited code

More information

Fall 2018 Midterm 2 November 15, 2018

Fall 2018 Midterm 2 November 15, 2018 Nme: 15-112 Fll 2018 Midterm 2 November 15, 2018 Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation Alloctor Bsics Dynmic Memory Alloction in the Hep (mlloc nd free) Pges too corse-grined for llocting individul objects. Insted: flexible-sized, word-ligned blocks. Allocted block (4 words) Free block (3

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

Lexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA

Lexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA Lexicl Anlysis Role, Specifiction & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-stte DFA - RE to DFA Conducting Lexicl Anlysis Techniques for specifying nd implementing lexicl nlyzers

More information

Eliminating left recursion grammar transformation. The transformed expression grammar

Eliminating left recursion grammar transformation. The transformed expression grammar Eliminting left recursion grmmr trnsformtion Originl! rnsformed! 0 0! 0 α β α α α α α α α α β he two grmmrs generte the sme lnguge, but the one on the right genertes the rst, nd then string of s, using

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Reference types and their characteristics Class Definition Constructors and Object Creation Special objects: Strings and Arrays

Reference types and their characteristics Class Definition Constructors and Object Creation Special objects: Strings and Arrays Objects nd Clsses Reference types nd their chrcteristics Clss Definition Constructors nd Object Cretion Specil objects: Strings nd Arrys OOAD 1999/2000 Cludi Niederée, Jochim W. Schmidt Softwre Systems

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Lecture T4: Pattern Matching

Lecture T4: Pattern Matching Introduction to Theoreticl CS Lecture T4: Pttern Mtching Two fundmentl questions. Wht cn computer do? How fst cn it do it? Generl pproch. Don t tlk bout specific mchines or problems. Consider miniml bstrct

More information

Midterm I Solutions CS164, Spring 2006

Midterm I Solutions CS164, Spring 2006 Midterm I Solutions CS164, Spring 2006 Februry 23, 2006 Plese red ll instructions (including these) crefully. Write your nme, login, SID, nd circle the section time. There re 8 pges in this exm nd 4 questions,

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques Brin Kernighn wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Junwen Li, li@cs, CS 217,258-0451 Yong Wng,yongwng@cs, CS

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

MATLAB Session for CS4514

MATLAB Session for CS4514 MATLAB Session for CS4514 Adrin Her her @wpi.edu Computing & Communictions Center - November 28, 2006- Prt of the notes re from Mtlb documenttion 1 MATLAB Session for CS4514 1. Mtlb Bsics Strting Mtlb

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University Fll 2014-2015 Compiler Principles Lecture 1: Lexicl Anlysis Romn Mnevich Ben-Gurion University Agend Understnd role of lexicl nlysis in compiler Lexicl nlysis theory Implementing professionl scnner vi

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Tool Vendor Perspectives SysML Thus Far

Tool Vendor Perspectives SysML Thus Far Frontiers 2008 Pnel Georgi Tec, 05-13-08 Tool Vendor Perspectives SysML Thus Fr Hns-Peter Hoffmnn, Ph.D Chief Systems Methodologist Telelogic, Systems & Softwre Modeling Business Unit Peter.Hoffmnn@telelogic.com

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-069 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

pdfapilot Server 2 Manual

pdfapilot Server 2 Manual pdfpilot Server 2 Mnul 2011 by clls softwre gmbh Schönhuser Allee 6/7 D 10119 Berlin Germny info@cllssoftwre.com www.cllssoftwre.com Mnul clls pdfpilot Server 2 Pge 2 clls pdfpilot Server 2 Mnul Lst modified:

More information

Digital Design. Chapter 1: Introduction. Digital Design. Copyright 2006 Frank Vahid

Digital Design. Chapter 1: Introduction. Digital Design. Copyright 2006 Frank Vahid Chpter : Introduction Copyright 6 Why Study?. Look under the hood of computers Solid understnding --> confidence, insight, even better progrmmer when wre of hrdwre resource issues Electronic devices becoming

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Problem Set 2 Fall 16 Due: Wednesday, September 21th, in class, before class begins.

Problem Set 2 Fall 16 Due: Wednesday, September 21th, in class, before class begins. Problem Set 2 Fll 16 Due: Wednesdy, September 21th, in clss, before clss begins. 1. LL Prsing For the following sub-problems, consider the following context-free grmmr: S T$ (1) T A (2) T bbb (3) A T (4)

More information

CS 321 Programming Languages and Compilers. Bottom Up Parsing

CS 321 Programming Languages and Compilers. Bottom Up Parsing CS 321 Progrmming nguges nd Compilers Bottom Up Prsing Bottom-up Prsing: Shift-reduce prsing Grmmr H: fi ; fi b Input: ;;b hs prse tree ; ; b 2 Dt for Shift-reduce Prser Input string: sequence of tokens

More information

PYTHON PROGRAMMING. The History of Python. Features of Python. This Course

PYTHON PROGRAMMING. The History of Python. Features of Python. This Course The History of Python PYTHON PROGRAMMING Dr Christin Hill 7 9 November 2016 Invented by Guido vn Rossum* t the Centrum Wiskunde & Informtic in Amsterdm in the erly 1990s Nmed fter Monty Python s Flying

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Spring 2015 1 Scanning A scanner transforms a character stream into a token stream. A scanner is sometimes

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques How to find me wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Mtvey Arye (rye), Tom Jlin (tjlin), Nick Johnson (npjohnso)

More information