Lexical Analysis and Lexical Analyzer Generators
|
|
- Harold Grant
- 6 years ago
- Views:
Transcription
1 1 Lexicl Anlysis nd Lexicl Anlyzer Genertors Chpter 3 COP5621 Compiler Construction Copyright Roert vn Engelen, Florid Stte University,
2 2 The Reson Why Lexicl Anlysis is Seprte Phse Simplifies the design of the compiler LL(1) or LR(1) prsing with 1 token lookhed would not e possile (multiple chrcters/tokens to mtch) Provides efficient implementtion Systemtic techniques to implement lexicl nlyzers y hnd or utomticlly from specifictions Strem uffering methods to scn input Improves portility Non-stndrd symols nd lternte chrcter encodings cn e normlized (e.g. trigrphs)
3 3 Interction of the Lexicl Anlyzer with the Prser Source Progrm Lexicl Anlyzer Token, tokenvl Get next token Prser error error Symol Tle
4 4 Attriutes of Tokens y := *x Lexicl nlyzer <id, y > <ssign, > <num, 31> <+, > <num, 28> <*, > <id, x > token tokenvl (token ttriute) Prser
5 5 Tokens, Ptterns, nd Lexemes A token is clssifiction of lexicl units For exmple: id nd num Lexemes re the specific chrcter strings tht mke up token For exmple: c nd 123 Ptterns re rules descriing the set of lexemes elonging to token For exmple: letter followed y letters nd digits nd non-empty sequence of digits
6 6 Specifiction of Ptterns for Tokens: Definitions An lphet Σ is finite set of symols (chrcters) A string s is finite sequence of symols from Σ s denotes the length of string s denotes the empty string, thus = 0 A lnguge is specific set of strings over some fixed lphet Σ
7 7 Specifiction of Ptterns for Tokens: String Opertions The conctention of two strings x nd y is denoted y xy The exponenttion of string s is defined y s 0 = s i = s i-1 s for i > 0 note tht s = s = s
8 8 Specifiction of Ptterns for Tokens: Lnguge Opertions Union L M = {s s L or s M} Conctention LM = {xy x L nd y M} Exponentition L 0 = {}; L i = L i-1 L Kleene closure L * = i=0,, L i Positive closure L + = i=1,, L i
9 9 Specifiction of Ptterns for Tokens: Regulr Expressions Bsis symols: is regulr expression denoting lnguge {} Σ is regulr expression denoting {} If r nd s re regulr expressions denoting lnguges L(r) nd M(s) respectively, then r s is regulr expression denoting L(r) M(s) rs is regulr expression denoting L(r)M(s) r * is regulr expression denoting L(r) * (r) is regulr expression denoting L(r) A lnguge defined y regulr expression is clled regulr set
10 10 Specifiction of Ptterns for Tokens: Regulr Definitions Regulr definitions introduce nming convention: d 1 r 1 d 2 r 2 d n r n where ech r i is regulr expression over Σ {d 1, d 2,, d i-1 } Any d j in r i cn e textully sustituted in r i to otin n equivlent set of definitions
11 11 Specifiction of Ptterns for Tokens: Regulr Definitions Exmple: letter A B Z z digit id letter ( letter digit ) * Regulr definitions re not recursive: digits digit digits digit wrong!
12 12 Specifiction of Ptterns for Tokens: Nottionl Shorthnd The following shorthnds re often used: r + = rr * r? = r [-z] = c z Exmples: digit [0-9] num digit + (. digit + )? ( E (+ -)? digit + )?
13 13 Grmmr Regulr Definitions nd stmt if expr then stmt if expr then stmt else stmt expr term relop term term term id num Grmmrs Regulr definitions if if then then else else relop < <= <> > >= = id letter ( letter digit ) * num digit + (. digit + )? ( E (+ -)? digit + )?
14 14 Coding Regulr Definitions in Trnsition Digrms relop < <= <> > >= = strt 0 < = > id letter ( letter digit ) * = 2 > 3 other 4 * return(relop, LE) return(relop, NE) return(relop, LT) return(relop, EQ) = 7 return(relop, GE) other 8*return(relop, GT) letter or digit strt 9 letter 10 other 11*return(gettoken(), instll_id())
15 Coding Regulr Definitions in 15 Trnsition Digrms: Code token nexttoken() { while (1) { switch (stte) { cse 0: c = nextchr(); if (c==lnk c==t c==newline) { stte = 0; lexeme_eginning++; } else if (c== < ) stte = 1; else if (c== = ) stte = 5; else if (c== > ) stte = 6; else stte = fil(); rek; cse 1: cse 9: c = nextchr(); if (isletter(c)) stte = 10; else stte = fil(); rek; cse 10: c = nextchr(); if (isletter(c)) stte = 10; else if (isdigit(c)) stte = 10; else stte = 11; rek; Decides the next strt stte to check int fil() { forwrd = token_eginning; swith (strt) { cse 0: strt = 9; rek; cse 9: strt = 12; rek; cse 12: strt = 20; rek; cse 20: strt = 25; rek; cse 25: recover(); rek; defult: /* error */ } return strt; }
16 16 The Lex nd Flex Scnner Genertors Lex nd its newer cousin flex re scnner genertors Systemticlly trnslte regulr definitions into C source code for efficient scnning Generted code is esy to integrte in C pplictions
17 17 Creting Lexicl Anlyzer with Lex nd Flex lex source progrm lex.l lex.yy.c lex or flex compiler C compiler lex.yy.c.out input strem.out sequence of tokens
18 18 Lex Specifiction A lex specifiction consists of three prts: regulr definitions, C declrtions in %{ %} %% trnsltion rules %% user-defined uxiliry procedures The trnsltion rules re of the form: p 1 { ction 1 } p 2 { ction 2 } p n { ction n }
19 19 Regulr Expressions in Lex x mtch the chrcter x \. mtch the chrcter. string mtch contents of string of chrcters. mtch ny chrcter except newline ^ mtch eginning of line $ mtch the end of line [xyz] mtch one chrcter x, y, or z (use \ to escpe -) [^xyz]mtch ny chrcter except x, y, nd z [-z] mtch one of to z r* closure (mtch zero or more occurrences) r+ positive closure (mtch one or more occurrences) r? optionl (mtch zero or one occurrence) r 1 r 2 mtch r 1 then r 2 (conctention) r 1 r 2 mtch r 1 or r 2 (union) ( r ) grouping r 1 \r 2 mtch r 1 when followed y r 2 {d} mtch the regulr expression defined y d
20 20 Exmple Lex Specifiction 1 Trnsltion rules %{ #include <stdio.h> %} %% [0-9]+ { printf( %s\n, yytext); }. \n { } %% min() { yylex(); } Contins the mtching lexeme Invokes the lexicl nlyzer lex spec.l gcc lex.yy.c -ll./.out < spec.l
21 21 Exmple Lex Specifiction 2 Trnsltion rules %{ #include <stdio.h> int ch = 0, wd = 0, nl = 0; %} delim [ \t]+ %% \n { ch++; wd++; nl++; } ^{delim} { ch+=yyleng; } {delim} { ch+=yyleng; wd++; }. { ch++; } %% min() { yylex(); printf("%8d%8d%8d\n", nl, wd, ch); } Regulr definition
22 22 Exmple Lex Specifiction 3 Trnsltion rules %{ #include <stdio.h> %} digit [0-9] letter [A-Z-z] Regulr definitions id {letter}({letter} {digit})* %% {digit}+ { printf( numer: %s\n, yytext); } {id} { printf( ident: %s\n, yytext); }. { printf( other: %s\n, yytext); } %% min() { yylex(); }
23 Exmple Lex Specifiction 4 23 %{ /* definitions of mnifest constnts */ #define LT (256) %} delim [ \t\n] ws {delim}+ letter [A-Z-z] digit [0-9] id {letter}({letter} {digit})* numer {digit}+(\.{digit}+)?(e[+\-]?{digit}+)? %% {ws} { } if {return IF;} then {return THEN;} else {return ELSE;} {id} {yylvl = instll_id(); return ID;} {numer} {yylvl = instll_num(); return NUMBER;} < {yylvl = LT; return RELOP;} <= {yylvl = LE; return RELOP;} = {yylvl = EQ; return RELOP;} <> {yylvl = NE; return RELOP;} > {yylvl = GT; return RELOP;} >= {yylvl = GE; return RELOP;} %% int instll_id() Token ttriute Return token to prser Instll yytext s identifier in symol tle
24 24 Design of Lexicl Anlyzer Genertor Trnslte regulr expressions to NFA Trnslte NFA to n efficient DFA Optionl regulr expressions NFA DFA Simulte NFA to recognize tokens Simulte DFA to recognize tokens
25 25 Nondeterministic Finite Automt An NFA is 5-tuple (S, Σ, δ, s 0, F) where S is finite set of sttes Σ is finite set of symols, the lphet δ is mpping from S Σ to set of sttes s 0 S is the strt stte F S is the set of ccepting (or finl) sttes
26 26 Trnsition Grph An NFA cn e digrmmticlly represented y leled directed grph clled trnsition grph strt S = {0,1,2,3} Σ = {,} s 0 = 0 F = {3}
27 27 Trnsition Tle The mpping δ of n NFA cn e represented in trnsition tle δ(0,) = {0,1} δ(0,) = {0} δ(1,) = {2} δ(2,) = {3} Stte Input Input 0 {0, 1} {0} 1 2 {2} {3}
28 28 The Lnguge Defined y n NFA An NFA ccepts n input string x if nd only if there is some pth with edges leled with symols from x in sequence from the strt stte to some ccepting stte in the trnsition grph A stte trnsition from one stte to nother on the pth is clled move The lnguge defined y n NFA is the set of input strings it ccepts, such s ( )* for the exmple NFA
29 29 Design of Lexicl Anlyzer Genertor: RE to NFA to DFA Lex specifiction with regulr expressions NFA p 1 { ction 1 } p 2 { ction 2 } p n { ction n } strt s 0 N(p 1 ) N(p 2 ) N(p n ) ction 1 ction 2 ction n Suset construction DFA
30 30 From Regulr Expression to NFA (Thompson s Construction) strt i f strt i f r 1 r 2 r 1 r 2 strt i N(r 1 ) N(r 2 ) strt i N(r 1 ) N(r 2 ) f r* strt i N(r) f f
31 31 Comining the NFAs of Set of Regulr Expressions strt 1 2 { ction 1 } { ction 2 } *+ { ction 3 } strt strt strt
32 32 Simulting the Comined NFA Exmple 1 strt ction 1 ction 3 6 ction none 7 8 ction 3 Must find the longest mtch: Continue until no further moves re possile When lst stte is ccepting: execute ction
33 33 Simulting the Comined NFA Exmple 2 strt ction 1 ction 3 6 ction none ction 2 ction 3 When two or more ccepting sttes re reched, the first ction given in the Lex specifiction is executed
34 34 Deterministic Finite Automt A deterministic finite utomton is specil cse of n NFA No stte hs n -trnsition For ech stte s nd input symol there is t most one edge leled leving s Ech entry in the trnsition tle is single stte At most one pth exists to ccept string Simultion lgorithm is simple
35 35 Exmple DFA A DFA tht ccepts ( )* strt
36 36 Conversion of n NFA into DFA The suset construction lgorithm converts n NFA into DFA using: -closure(s) = {s} {t s t} -closure(t) = s T -closure(s) move(t,) = {t s t nd s T} The lgorithm produces: Dsttes is the set of sttes of the new DFA consisting of sets of sttes of the NFA Dtrn is the trnsition tle of the new DFA
37 37 -closure nd move Exmples strt closure({0}) = {0,1,3,7} move({0,1,3,7},) = {2,4,7} -closure({2,4,7}) = {2,4,7} move({2,4,7},) = {7} -closure({7}) = {7} move({7},) = {8} -closure({8}) = {8} move({8},) = 0 2 none Also used to simulte NFAs
38 38 Simulting n NFA using -closure nd move S := -closure({s 0 }) S prev := := nextchr() while S do S prev := S S := -closure(move(s,)) := nextchr() end do if S prev F then execute ction in S prev return yes else return no
39 39 The Suset Construction Algorithm Initilly, -closure(s 0 ) is the only stte in Dsttes nd it is unmrked while there is n unmrked stte T in Dsttes do mrk T for ech input symol Σ do U := -closure(move(t,)) if U is not in Dsttes then dd U s n unmrked stte to Dsttes end if Dtrn[T,] := U end do end do
40 Suset Construction Exmple 1 40 strt strt A C B D E Dsttes A = {0,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,10}
41 Suset Construction Exmple 2 41 strt strt 1 3 A C D B E 3 F 2 3 Dsttes A = {0,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8}
42 42 Minimizing the Numer of Sttes of DFA C strt A B D E strt A B D E
43 43 From Regulr Expression to DFA Directly The importnt sttes of n NFA re those without n -trnsition, tht is if move({s},) for some then s is n importnt stte The suset construction lgorithm uses only the importnt sttes when it determines -closure(move(t,))
44 44 From Regulr Expression to DFA Directly (Algorithm) Augment the regulr expression r with specil end symol # to mke ccepting sttes importnt: the new expression is r# Construct syntx tree for r# Trverse the tree to construct functions nullle, firstpos, lstpos, nd followpos
45 45 From Regulr Expression to DFA Directly: Syntx Tree of ( )*# conctention # 6 closure 4 5 lterntion * position numer (for lefs )
46 46 From Regulr Expression to DFA Directly: Annotting the Tree nullle(n): the sutree t node n genertes lnguges including the empty string firstpos(n): set of positions tht cn mtch the first symol of string generted y the sutree t node n lstpos(n): the set of positions tht cn mtch the lst symol of string generted e the sutree t node n followpos(i): the set of positions tht cn follow position i in the tree
47 From Regulr Expression to DFA 47 Directly: Annotting the Tree Node n nullle(n) firstpos(n) lstpos(n) Lef true Lef i flse {i} {i} / \ c 1 c 2 c 1 / \ c 2 nullle(c 1 ) or nullle(c 2 ) nullle(c 1 ) nd nullle(c 2 ) firstpos(c 1 ) firstpos(c 2 ) if nullle(c 1 ) then firstpos(c 1 ) firstpos(c 2 ) else firstpos(c 1 ) lstpos(c 1 ) lstpos(c 2 ) if nullle(c 2 ) then lstpos(c 1 ) lstpos(c 2 ) else lstpos(c 2 ) * true firstpos(c 1 ) lstpos(c 1 ) c 1
48 48 From Regulr Expression to DFA Directly: Syntx Tree of ( )*# {1, 2, 3} {6} {1, 2, 3} {5} {6}# {6} 6 nullle {1, 2, 3} {1, 2} {1, 2} * {1, 2, 3} {3} {4} {3} {3} 3 {4} {4} 4 {5} {5} 5 firstpos lstpos {1, 2} {1} {1} 1 {1, 2} {2} {2} 2
49 49 From Regulr Expression to DFA Directly: followpos for ech node n in the tree do if n is ct-node with left child c 1 nd right child c 2 then for ech i in lstpos(c 1 ) do followpos(i) := followpos(i) firstpos(c 2 ) end do else if n is str-node for ech i in lstpos(n) do followpos(i) := followpos(i) firstpos(n) end do end if end do
50 From Regulr Expression to DFA 50 Directly: Algorithm s 0 := firstpos(root) where root is the root of the syntx tree Dsttes := {s 0 } nd is unmrked while there is n unmrked stte T in Dsttes do mrk T for ech input symol Σ do let U e the set of positions tht re in followpos(p) for some position p in T, such tht the symol t position p is if U is not empty nd not in Dsttes then dd U s n unmrked stte to Dsttes end if Dtrn[T,] := U end do end do
51 51 From Regulr Expression to DFA Directly: Exmple Node followpos 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {5} 5 {6} strt 1,2, 1,2,3 3,4 1,2, 3,5 1,2, 3,6
52 52 Time-Spce Trdeoffs Automton Spce (worst cse) Time (worst cse) NFA O( r ) O( r x ) DFA O(2 r ) O( x )
Principles of Programming Languages
Principles of Progrmming Lnguges h"p://www.di.unipi.it/~ndre/did2c/plp- 14/ Prof. Andre Corrdini Deprtment of Computer Science, Pis Lesson 5! Gener;on of Lexicl Anlyzers Creting Lexicl Anlyzer with Lex
More informationΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών
ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop
More informationΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos
ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è
More informationLexical Analysis: Constructing a Scanner from Regular Expressions
Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction
More informationFig.25: the Role of LEX
The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing
More informationTopic 2: Lexing and Flexing
Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of
More informationReducing a DFA to a Minimal DFA
Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,
More informationDr. D.M. Akbar Hussain
Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence
More informationDefinition of Regular Expression
Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 5
CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,
More informationCS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08
CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008
More informationCS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata
CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;
More informationIn the last lecture, we discussed how valid tokens may be specified by regular expressions.
LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.
More informationCS 430 Spring Mike Lam, Professor. Parsing
CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie
More informationCSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona
CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded
More informationLexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay
Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input
More informationExample: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:
Lexicl Anlysis Red source progrm nd produce list of tokens ( liner nlysis) source progrm The lexicl structure is specified using regulr expressions Other secondry tsks: (1) get rid of white spces (e.g.,
More informationCMPT 379 Compilers. Lexical Analysis
CMPT 379 Compilers Anoop Srkr http://www.cs.sfu.c/~noop 9//7 Lexicl Anlysis Also clled scnning, tke input progrm string nd convert into tokens Exmple: T_DOUBLE ( doule ) T_IDENT ( f ) T_OP ( = ) doule
More informationLexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA
Lexicl Anlysis Role, Specifiction & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-stte DFA - RE to DFA Conducting Lexicl Anlysis Techniques for specifying nd implementing lexicl nlyzers
More informationFinite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015
Finite Automt Lecture 4 Sections 3.6-3.7 Ro T. Koether Hmpden-Sydney College Wed, Jn 21, 2015 Ro T. Koether (Hmpden-Sydney College) Finite Automt Wed, Jn 21, 2015 1 / 23 1 Nondeterministic Finite Automt
More informationLanguages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *
Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte
More informationImplementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona
Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this
More informationLexical analysis, scanners. Construction of a scanner
Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.
More informationCompilation
Compiltion 0368-3133 Lecture 2: Lexicl Anlysis Nom Rinetzky 1 2 Lexicl Anlysis Modern Compiler Design: Chpter 2.1 3 Conceptul Structure of Compiler Compiler Source text txt Frontend Semntic Representtion
More informationCompiler Construction D7011E
Compiler Construction D7011E Lecture 3: Lexer genertors Viktor Leijon Slides lrgely y John Nordlnder with mteril generously provided y Mrk P. Jones. 1 Recp: Hndwritten Lexers: Don t require sophisticted
More informationCMPSC 470: Compiler Construction
CMPSC 47: Compiler Construction Plese complete the following: Midterm (Type A) Nme Instruction: Mke sure you hve ll pges including this cover nd lnk pge t the end. Answer ech question in the spce provided.
More informationAssignment 4. Due 09/18/17
Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr
More informationDeterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1
Deterministic Finite Automt And Regulr Lnguges Fll 2018 Costs Busch - RPI 1 Deterministic Finite Automton (DFA) Input Tpe String Finite Automton Output Accept or Reject Fll 2018 Costs Busch - RPI 2 Trnsition
More informationCSCE 531, Spring 2017, Midterm Exam Answer Key
CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (
More informationCS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.
CS 340, Fll 2014 Dec 11 th /13 th Finl Exm Nme: Note: in ll questions, the specil symol ɛ (epsilon) is used to indicte the empty string. Question 1. [5 points] Consider the following regulr expression;
More informationRecognition of Tokens
42 Recognton o Tokens The queston s how to recognze the tokens? Exmple: ssume the ollowng grmmr rgment to generte specc lnguge: stmt expr expr then stmt expr then stmt else stmt term relop term term term
More information12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>
DDD16 Compilers nd Interpreters DDB44 Compiler Construction R Prsing Prt 1 R prsing concept Using prser genertor Prse ree Genertion Wht is R-prsing? eft-to-right scnning R Rigthmost derivtion in reverse
More informationCS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis
CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl
More informationCSE 401 Midterm Exam 11/5/10 Sample Solution
Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed
More informationRegular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup
Regulr Expression Mtching with Multi-Strings nd Intervls Philip Bille Mikkel Thorup Outline Definition Applictions Previous work Two new problems: Multi-strings nd chrcter clss intervls Algorithms Thompson
More informationFall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University
Fll 2014-2015 Compiler Principles Lecture 1: Lexicl Anlysis Romn Mnevich Ben-Gurion University Agend Understnd role of lexicl nlysis in compiler Lexicl nlysis theory Implementing professionl scnner vi
More informationCSE302: Compiler Design
CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 13, 2007 Outline Recap
More informationScanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an
Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,
More information10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing
Motivting Exmple Lexicl nd yntx Anlysis (2) In Text: Chpter 4 Consider the grmmr -> cad A -> b Input string: w = cd How to build prse tree top-down? 2 Initilly crete tree contining single node (the strt
More informationShould be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night
Should e done L hours nd Office hours Sign up for the miling list t, strting to send importnt info to list http://groups.google.com/group/cs453-spring-2011 Red Ch 1 nd skim Ch 2 through 2.6, red 3.3 nd
More informationSome Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing
Undergrdute Compilers Review nd Intro to MJC Announcements Miling list is in full swing Tody Some thoughts on grd school Finish prsing Semntic nlysis Visitor pttern for bstrct syntx trees Some Thoughts
More informationCS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.
CS 340, Fll 2016 Sep 29th Exm 1 Nme: Note: in ll questions, the speil symol ɛ (epsilon) is used to indite the empty string. Question 1. [10 points] Speify regulr expression tht genertes the lnguge over
More informationProf. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan
Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens
More informationFall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev
Fll 2016-2017 Compiler Principles Lecture 1: Lexicl Anlysis Romn Mnevich Ben-Gurion University of the Negev Agend Understnd role of lexicl nlysis in compiler Regulr lnguges reminder Lexicl nlysis lgorithms
More informationWhat are suffix trees?
Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl
More informationScanner Termination. Multi Character Lookahead
If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte
More informationCompilers Spring 2013 PRACTICE Midterm Exam
Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs
More informationthis grammar generates the following language: Because this symbol will also be used in a later step, it receives the
LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.
More informationCompiler course. Chapter 3 Lexical Analysis
Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata
More informationLR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table
TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph
More informationLEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class
LEX5: Regexps to NFA Lexicl Anlysis CMPT 379: Compilers Instructor: Anoop Srkr noopsrkr.github.io/compilers-clss Building Lexicl Anlyzer Token POern POern Regulr Expression Regulr Expression NFA NFA DFA
More informationChapter 3 Lexical Analysis
Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical
More informationTheory of Computation CSE 105
$ $ $ Theory of Computtion CSE 105 Regulr Lnguges Study Guide nd Homework I Homework I: Solutions to the following problems should be turned in clss on July 1, 1999. Instructions: Write your nswers clerly
More informationCSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011
CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the
More informationTries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries
Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer
More informationCOS 333: Advanced Programming Techniques
COS 333: Advnced Progrmming Techniques Brin Kernighn wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Junwen Li, li@cs, CS 217,258-0451 Yong Wng,yongwng@cs, CS
More informationCS 321 Programming Languages and Compilers. Bottom Up Parsing
CS 321 Progrmming nguges nd Compilers Bottom Up Prsing Bottom-up Prsing: Shift-reduce prsing Grmmr H: fi ; fi b Input: ;;b hs prse tree ; ; b 2 Dt for Shift-reduce Prser Input string: sequence of tokens
More informationCOS 333: Advanced Programming Techniques
COS 333: Advnced Progrmming Techniques How to find me wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Mtvey Arye (rye), Tom Jlin (tjlin), Nick Johnson (npjohnso)
More informationApplied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016
Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore
More informationCOMP 423 lecture 11 Jan. 28, 2008
COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring
More informationTO REGULAR EXPRESSIONS
Suject :- Computer Science Course Nme :- Theory Of Computtion DA TO REGULAR EXPRESSIONS Report Sumitted y:- Ajy Singh Meen 07000505 jysmeen@cse.iit.c.in BASIC DEINITIONS DA:- A finite stte mchine where
More informationOperator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or
Opertor Precedence Most progrmming lnguges hve opertor precedence rules tht stte the order in which opertors re pplied (in the sence of explicit prentheses). Thus in C nd Jv nd CSX, +*c mens compute *c,
More informationECE 468/573 Midterm 1 September 28, 2012
ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other
More informationCS308 Compiler Principles Lexical Analyzer Li Jiang
CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular
More informationCMSC 331 First Midterm Exam
0 00/ 1 20/ 2 05/ 3 15/ 4 15/ 5 15/ 6 20/ 7 30/ 8 30/ 150/ 331 First Midterm Exm 7 October 2003 CMC 331 First Midterm Exm Nme: mple Answers tudent ID#: You will hve seventy-five (75) minutes to complete
More information2014 Haskell January Test Regular Expressions and Finite Automata
0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded
More informationCS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.
CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement
More informationLexical Analysis (ASU Ch 3, Fig 3.1)
Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program
More informationASTs, Regex, Parsing, and Pretty Printing
ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,
More informationJava CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications
Jv CUP Jv CUP is prser-genertion tool, similr to Ycc. CUP uilds Jv prser for LALR(1) grmmrs from production rules nd ssocited Jv code frgments. When prticulr production is recognized, its ssocited code
More informationstack of states and grammar symbols Stack-Bottom marker C. Kessler, IDA, Linköpings universitet. 1. <list> -> <list>, <element> 2.
TDDB9 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing Updted/New slide mteril 007: Pushdown Automton for LR-Prsing Finite-stte pushdown utomton contins lterntingly sttes nd symols in NUΣ
More informationSuffix trees, suffix arrays, BWT
ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time
More information2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively
Regular expressions: a regular expression is built up out of simpler regular expressions using a set of defining rules. Regular expressions allows us to define tokens of programming languages such as identifiers.
More informationMidterm I Solutions CS164, Spring 2006
Midterm I Solutions CS164, Spring 2006 Februry 23, 2006 Plese red ll instructions (including these) crefully. Write your nme, login, SID, nd circle the section time. There re 8 pges in this exm nd 4 questions,
More informationQuiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex
Long Quiz2 45mins Nme: Personl Numer: Prolem. (20pts) Here is n Tle of Perl Regulr Ex Chrcter Description. single chrcter \s whitespce chrcter (spce, t, newline) \S non-whitespce chrcter \d digit (0-9)
More informationRegular Expressions and Automata using Miranda
Regulr Expressions nd Automt using Mirnd Simon Thompson Computing Lortory Univerisity of Kent t Cnterury My 1995 Contents 1 Introduction ::::::::::::::::::::::::::::::::: 1 2 Regulr Expressions :::::::::::::::::::::::::::::
More informationInformation Retrieval and Organisation
Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d
More informationHomework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)
Homework Context Free Lnguges III Prse Trees nd Homework #5 (due 10/22) From textbook 6.4,b 6.5b 6.9b,c 6.13 6.22 Pln for tody Context Free Lnguges Next clss of lnguges in our quest! Lnguges Recll. Wht
More informationPRACTICAL CLASS: Flex & Bison
Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS PRACTICAL CLASS: Flex & Bison Eliana Bove eliana.bove@poliba.it Install On Linux: install with the package
More informationScanning Theory and Practice
CHAPTER 3 Scnning Theory nd Prctice 3.1 Overview The primry function of scnner is to red in chrcters from source file nd group them into tokens. A scnner is sometimes clled lexicl nlyzer or lexer. The
More informationModule 6 Lexical Phase - RE to DFA
Module 6 Lexical Phase - RE to DFA The objective of this module is to construct a minimized DFA from a regular expression. A NFA is typically easier to construct but string matching with a NFA is slower.
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully
More informationFrom Dependencies to Evaluation Strategies
From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute
More informationLexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool
Lexical Analysis Implementing Scanners & LEX: A Lexical Analyzer Tool Copyright 2016, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California
More informationLexical Analyzer Scanner
Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce
More informationPrinciples of Compiler Design Presented by, R.Venkadeshan,M.Tech-IT, Lecturer /CSE Dept, Chettinad College of Engineering &Technology
Principles of Compiler Design Presented by, R.Venkadeshan,M.Tech-IT, Lecturer /CSE Dept, Chettinad College of Engineering &Technology 6/30/2010 Principles of Compiler Design R.Venkadeshan 1 Preliminaries
More informationLexical Analyzer Scanner
Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce
More informationFigure 2.1: Role of Lexical Analyzer
Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer
More informationCS 236 Language and Computation. Alphabet. Definition. I.2.1. Formal Languages (10.1)
C 236 Lnguge nd Computtion Course Notes Prt I: Grmmrs for Defining yntx (II) Chpter I.2: yntx nd Grmmrs (10, 12.1) Anton etzer (Bsed on ook drft y J. V. Tucker nd K. tephenson) Dept. of Computer cience,
More information[Lexical Analysis] Bikash Balami
1 [Lexical Analysis] Compiler Design and Construction (CSc 352) Compiled By Central Department of Computer Science and Information Technology (CDCSIT) Tribhuvan University, Kirtipur Kathmandu, Nepal 2
More informationDixita Kagathara Page 1
2014 Sem - VII Lexical Analysis 1) Role of lexical analysis and its issues. The lexical analyzer is the first phase of compiler. Its main task is to read the input characters and produce as output a sequence
More informationacronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata
EE573 Fll 2002, Exm open book, if question seems mbiguous, sk me to clrify the question. If my nswer doesn t stisfy you, plese stte your ssumptions. cronyms possibly used in this test: CFG :context free
More informationChapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley
More informationTop-down vs Bottom-up. Bottom up parsing. Sentential form. Handles. Handles in expression example
Bottom up prsing Generl e LR0) LR LR1) LLR o est exploit JvCUP, should understnd the theoreticl sis LR prsing); op-down vs Bottom-up Bottom-up more powerful thn top-down; Cn process more powerful grmmr
More informationVirtual Machine (Part I)
Hrvrd University CS Fll 2, Shimon Schocken Virtul Mchine (Prt I) Elements of Computing Systems Virtul Mchine I (Ch. 7) Motivtion clss clss Min Min sttic sttic x; x; function function void void min() min()
More informationOutline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST
Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is
More information2 Computing all Intersections of a Set of Segments Line Segment Intersection
15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design
More informationSample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009
Deprtment of Computer cience Columbi University mple Midterm olutions COM W4115 Progrmming Lnguges nd Trnsltors Mondy, October 12, 2009 Closed book, no ids. ch question is worth 20 points. Question 5(c)
More information